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Abstract 


Recent developments in the field of digital design and hardware verification have 
found great use for restricted forms of branching programs. In particular, oblivious 
read-once branching programs (also called “OBDD’s”) are central to a very common 
technique for verifying circuits. These programs are useful because they are easily 
manipulated and compared for equivalence. However, their utility is limited because 
they cannot compute in polynomial size several simple functions—most notably, integer 
multiplication. This limitation has prompted the consideration of alternative models, 
usually restricted classes of branching programs, in the hope of finding one with greater 


computational power but also easily manipulated and tested for equivalence. 


Read-once (non-oblivious) branching programs can to some degree be manipulated 
and tested for equivalence, but it has been an open question whether they can compute 
integer multiplication in polynomial size. The main result of this thesis proves that 
they cannot—multiplication requires size 2%(V, This is the first lower bound for 
multiplication on non-oblivious branching programs. By defining the appropriate kind 
of problem reduction, which we call read-once reductions, we are able to show that our 


result implies the same asymptotic lower bound for other arithmetic functions. 


We also survey known results about the various alternative models, describing the 
main techniques used for thinking about their computation and for proving lower 
bounds. These techniques are illustrated with two proofs that have not appeared 
in the literature. We summarize the known results by taking a structural approach of 


comparing the complexity classes corresponding to the various models. 
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CHAPTER 1 


Introduction 


Branching programs have recently been found very useful in the field of hardware 
verification. The central problem of verification is to check whether a combinational 
hardware circuit has been correctly designed. One approach commonly employed today 
is to convert independently the circuit description and the function specification to a 
common intermediate representation and then test whether the two representations are 
equivalent (e.g., [Br92, We94]). The use of restricted forms of branching programs for 
the intermediate representation has made this approach feasible and very popular— 
several software packages are available for implementing this very strategy |[Kr94, Br92]. 
This application raises several issues of computational complexity, renewing interest 
in the low-level complexity of branching programs. This thesis explores some of these 


issues from a computational complexity-theoretic point of view. 


1.1 The role of branching programs in hardware verification 


Most of the computational models considered as candidates for the intermediate rep- 
resentation are restricted classes of branching programs. A branching program is a 
directed acyclic graph with a distinguished root node and two sink nodes. The sink 
nodes are labeled 0 and 1 and each non-sink node is labeled with an input variable x;, 


2 € [n], and has two outgoing edges, labeled 0 and 1. A branching program computes 
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a Boolean function f : {0,1}" — {0,1} in the natural manner: each assignment of 
Boolean values to the variables x; defines a unique path through the graph from the 
root to one of the sinks; the label of that sink defines the value of the function on 
that input. The szze of a branching program is its number of nodes. Since branching 
programs are a non-uniform model of computation, asymptotic statements about size 


refer to families of branching programs containing one program for each input size. 


The circuit to be verified is assumed to be an ordinary combinational single-output 
circuit, built up from a standard basis of Boolean functions such as {A,V,—7}. The 
typical algorithm for constructing the intermediate representation from the circuit is 
to work bottom-up through the circuit, from the inputs to the output, combining the 
representations appropriately at each gate. Thus, the algorithm need only compute a 
representation for f Ag, f Vg, and =f, when given representations for f and g. In the 
literature, these are called the “synthesis operations”. It is easy to see that arbitrary 


polynomial-size branching programs are closed under these operations. 


This strategy for verification has several shortcomings that are immediately ap- 
parent. First, unrestricted polynomial-size branching programs compute exactly those 
functions in non-uniform logspace. Therefore, if the intermediate representation is a 
restricted form of branching program, we clearly cannot hope for a general algorithm 
to compute a polynomial-size representation (polynomial in the size of the original cir- 
cuit) unless L/ poly = P/poly. This difficulty has largely been accepted as inherent and 
not critical, since functions computed at level of hardware are not generally complex 
and are in fact in L anyway. A second observation is that efficient algorithms for the 
individual synthesis operations do not imply that the resulting bottom-up algorithm for 
computing a representation is efficient: for example, if the output of each operation has 
size that is the product of the input representations, the final representation will have 
size exponential in the size of the original circuit. Despite this problem, researchers 
have been content with the bottom-up algorithm as long as each synthesis operation 


can be performed efficiently. 


Finally, there is the problem of testing whether the two branching programs, cor- 
responding to the circuit and the specification, are equivalent. It is easy to see that 


this problem is co-NP-complete: Given a 3-CNF with variables {21,...,7,}, we may 
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to construct a branching program on the same variables which accepts exactly when 
the formula is satisfied. A polynomial-time algorithm for equivalence then clearly gives 
a polynomial-time algorithm for 3-SAT by comparing this program with the trivial 
branching program that always rejects; to say they are not equivalent is to say the 


formula is satisfiable. 


1.2 Restricted branching programs 


Because of the difficulty of comparing arbitrary branching programs for equivalence, 
the intermediate representation is instead chosen to be a restricted class of branching 
programs. These are oblivious read-once branching programs, or OBDD’s (“ordered 


binary decision diagrams” ). 


Definition 1 A branching program is read-once if on every path from the source to a 


sink, each variable appears at most once as the label of a vertex. 


Definition 2 A branching program is oblivious if on every path from the source to a 


sink, the variables appear in the same order. 


Our definition of oblivious is slightly different from the usual definition, which requires 
the branching program to be leveled (for each node, all paths from the sink have the 
same length) with each node at a given level labeled with the same variable. Our 
definition does not require leveling; it is easy to see that any oblivious program may 
be leveled at a cost in size of a factor of n, the number of variables. Since we will 
primarily be concerned with polynomial versus exponential growth, we will or will not 


assume leveled programs as convenient. 


Thus, OBDD’s may be thought of as non-uniform acyclic finite-state automata. No- 
tice that the read-once property implies that an OBDD is satisfiable exactly when there 
exists a path from the source to the accepting sink—since no variable appears more 
than once on any path, there is a consistent assignment to the variables corresponding 
to that path. An OBDD for —f is trivially constructed by exchanging the accepting 
and rejecting sinks. Given two OBDD’s for f and g that obey the same ordering of the 
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variables, an OBDD for f Ag or f Vg is easily constructed using the standard product 
constructions for finite automata. (This last statement is not true if the two OBDD’s 
do not obey the same ordering—see Section 2.2.1.) It follows that two OBDD’s are 


easily tested for equivalence by testing their exclusive-or for satisfiability. 


Because of the tractability of these operations on OBDD’s, they have been the in- 
termediate representation of choice. However, OBDD’s are clearly a very weak model 
of computation, and the question arises whether they are sufficiently powerful to meet 
the needs at hand. The answer is yes, for the most part—OBDD’s can compute in 
polynomial size such functions as integer addition, symmetric Boolean functions, and 
many of the benchmark functions used by the verification community [BF85]—but with 
a very important exception: exponential size is required to compute integer multiplica- 
tion [Br91]. This is an serious setback to the viability of OBDD’s, since the hardware 
to be tested typically contains circuits that perform multiplication. Today, the largest 
multipliers that can be checked using this method have 12-bit inputs; ideally, circuit 


designers would like to check multipliers of 32 or even 64 bits. 


Thus, despite the success of this approach, there has also been great effort expended 
to find another model that is likewise manipulated, but with greater computational 
power [SDG94, SW95, e.g.]. Most of these models—k-OBDD’s, k-IBDD’s, nondeter- 
ministic OBDD’s—have proven too weak to compute multiplication in polynomial size 
(see Chapter 2). A common feature of these models is that they are all oblivious 
branching programs. It is therefore natural to consider non-oblivious programs, the 


simplest of these being read-once programs. 


Unfortunately, read-once programs do not enjoy quite the same degree of manip- 
ulability as OBDD’s. Determining whether a read-once program is satisfiable is as 
simple as for an OBDD, since the read-once property implies that the program is sat- 
isfiable exactly when there is a path from the source to the accepting sink. Also, 
testing equivalence is reasonably tractable: although it is not known how to do so in 
deterministic polynomial time, there is a randomized polynomial-time algorithm with 
one-sided error due to Blum, Chandra, and Wegman [BCW80]. The synthesis opera- 
tions, however, are provably not tractable: there exist functions f and g that each have 


polynomial-size read-once programs but whose conjunction f A g requires exponential- 
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size read-once programs. Despite their relative recalcitrance, read-once programs have 
been considered by some researchers for possible use in hardware verification [GM94]. 
Until now, however, very little was known about the complexity of multiplication with 


any non-oblivious programs. 


In this thesis, we prove that multiplication requires (non-oblivious) read-once branch- 
ing programs of size 2%V", This is the first superpolynomial lower bound for multi- 
plication on non-oblivious branching programs. This result demonstrates that relaxing 
the ordering restriction of OBDD’s is insufficient to gain the desired computational 
power, and thus further strengthening of the model is needed. By defining the ap- 
propriate kind of problem reduction, which we call read-once reductions, we are able 
to show that our result implies the same asymptotic lower bound for other arithmetic 


functions. 


Chapter 2 considers in some detail the other models, all essentially generalizations 
of OBDD’s. In addition to summarizing the lower bounds are known for functions in the 
various models, we compare the classes of functions that are computable in polynomial 
size by the models, and also describe the techniques available for proving lower bounds 
in the different models. Included are two simple proofs that have not appeared in 
the literature. Chapter 3 gives the lower bound for multiplication and the problem 


reductions; Chapter 4 concludes with statements of the interesting open problems. 
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CHAPTER 2 


Related models 


In the search for alternatives to OBDD’s, many models have been considered. In 
addition to their relevance for hardware verification, they are interesting also for the 


questions of structural complexity that they raise. 


This chapter begins by summarizing the various extensions to OBDD’s and read- 
once programs, including adding nondeterminism and allowing variables to be read k 
times. These different models are compared in two respects: (1) the ease with which 


such programs are manipulated, and (2) their computational power. 


Section 2.3 summarizes the known lower bounds. We then take a structural view of 
the relationships between the classes of functions computable in polynomial size for the 
various models. We will see that the two restrictions obliviousness and restricted reading 
are orthogonal to each other: With respect to polynomial size, there are functions 
that can be computed with read-once programs but cannot be computed by oblivious 
read-k-times programs for any constant k; yet at the same time, there are functions 
computable by oblivious read-k-times programs that cannot be computed by (non- 
oblivious) read-once programs. We will also consider the hierarchies with respect to k 


in the various models. 


Section 2.5 briefly outlines the primary techniques for proving lower bounds, in- 


cluding two proofs that have not appeared in the literature. In Section 2.6 we discuss 
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the problem of integer multiplication and describe the known lower bounds. Finally, 


in Section 2.7 we mention two related issues. 


2.1 Definitions 


We begin with the definitions of the various extensions to the basic models. Recall that 
in a read-once branching program, each variable appears at most once on every path 
from the source to a sink; an OBDD is an oblivious read-once branching program—each 


path through the program inspects the variables in the same order, each at most once. 


Two recently proposed models, which we shall not consider here, are “graph-driven 
BDD’s” [SW95] and “binary moment diagrams” [BC94]. The latter are not branching 
programs, and do not compute a function, but they do allow polynomial-size represen- 
tation of multiplication. Also, in [$95] lower bounds are proved on branching programs 
in which for each path, the number of variables appearing more than once is bounded 
by k. In [MW95], lower bounds are proved for nondeterministic programs in which 


each path obeys a bound on the number of alternations between sets of variables. 


2.1.1 Reading each variable k times 


There are essentially three models of branching programs in which each variable may 


be read multiple times: 


1. k-OBDD’s (also known as k-BDD’s [BSSW93]). On each path the variables appear 


at most & times each in an order that is the same permutation repeated k times. 


2. k-IBDD’s. On each path the variables appear at most k& times each in an order 


that is the concatenation of k (possibly different) permutations. 


3. Read-k-times programs. On each path the variables appear at most k times each. 


We remark that our definition of read-k-times programs prevents a variable from ap- 
pearing more than k times on any path from the source to either sink. These are 


sometimes referred to as syntactic read-k-times programs, in contrast to semantic 
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read-k-times programs in which the limited reading need hold only for those paths 
which some input may follow—un-traversable paths need not obey the read-k-times 
restriction. (The two defintions are equivalent for k = 1.) While the “semantic” def- 
inition is perhaps more natural from the point of view of algorithms (upper bounds), 
the “syntactic” definition is more combinatorial and more amenable to proving lower 
bounds. No lower bounds (for explicit functions) are known for semantic read-k-times 


programs. 


2.1.2 Nondeterminism 


The simplest and most common way to introduce nondeterminism is to permit some 
nodes to be unlabeled and allow either of the two outgoing edges to be traversed on 
any input. Such a program is said to accept if the input may follow some path from 
the root to an accepting sink—that is, there exists a path in the subgraph induced 
by removing edges that are not traversable. It is not surprising that polynomial-size 
nondeterministic branching programs accept exactly those languages in (nonuniform) 


NL, nondeterministic logspace. 


We may think of the unlabeled nodes of a nondeterministic branching program as 
being OR nodes. A standard generalization introduces nodes corresponding to other 
binary functions. Allowing AND nodes, for example, naturally enables polynomial-size 
programs to accept languages in co-NL. As NL = co-NL, it happens that allowing 
AND nodes results in the same power as OR nodes for polynomial-size programs’. 
Allowing both AND nodes and OR nodes enables polynomial-size programs to recog- 
nize alternating logspace, which is equal to P. By allowing parity nodes, polynomial 
programs recognize @L, a logspace analogue to GP [KW93]. Meinel [Me89] explores 


the range of all possibilities and concludes that allowing nodes of other binary Boolean 


functions does not give classes different from L, NL, P, or @L. 


'Tt is easy to see that the proof of [Im88] yields the same result in the non-uniform case: Given 
a polynomial-size branching program with OR nodes, that proof constructs another polynomial-size 
branching program with OR nodes that accepts exactly when the original program rejects. This OR- 
program for f is easily converted to an AND-program for f by replacing the OR nodes with AND 


nodes and switching the accepting and rejecting nodes. 
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Note that we have not introduced nondeterminism as we would with circuits, where 
we would allow nondeterministic variables as inputs. Defining nondeterminism in this 
manner immediately gives (nonuniform) NP for polynomial-size programs, since NP is 


characterized by polynomial-size nondeterministic formulas. 


We mention that Borodin, Razborov and Smolensky [BRS93] use a different defi- 
nition of nondeterminism: Nodes are unlabeled and each edge is either unlabeled or 
labeled with a variable and a value. Unlabeled edges are considered “free” edges which 
may be traversed by any input; labeled edges may of course be traversed only by inputs 
consistent with the label. The measure of size is number of labeled edges, rather than 
number of nodes. The difference in models is not of consequence for our purposes, 
as it is easy to see that the two size measures are within a constant factor of each 
other. Clearly, our nondeterministic branching programs are essentially a special case 
of theirs, and the number of edges in one of our programs is at most twice the number 
of nodes. Conversely, a program in their form is easily converted to one of our form in 


which the number of nodes is at most the number of edges in the original program. 


There is another model of nondeterministic branching programs, called rectifier- 
and-switching networks, which is preferred by Razborov because of the combinatorial 
characterization its size measure affords (see [Ra91, Ra90]). A rectifier-and-switching 
network is essentially a nondeterministic branching program as [BRS93] defines them, 
except that the (directed) graph may contain cycles. There is no “rejecting sink” and 
the program accepts exactly when there exists at least one path from the source to 
the (accepting) sink. The measure of size is the number of labeled edges. Again, 
our nondeterministic programs are essentially a special case of rectifier-and-switching 
networks. So for a given function, our programs may be larger, but not by more than 
a quadratic factor, as the following transformation demonstrates. To make a network 
of F edges acyclic, place F copies of it in sequence redirecting original “back edges” 
(those edges which lead to a node that is not further from the root) to lead instead 
to the copy of the destination node in the subsequent copy of the graph. At most F 
copies are needed since any path contains at most Ff edges and an extra copy of the 
graph is needed only for each back edge in the path. Thus at a cost of squaring the 
size, we obtain a nondeterministic program in the sense of [BR593]. It is not known if 


this measure is within a constant factor of the other two [Ra91, Open Question #1]. 
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2.2 Manipulating branching programs 


As explained in Chapter 1, OBDD’s have the useful property that they are easily 
manipulated: Given OBDD’s for f and g that obey the same ordering of the variables, 
it is easy to construct an OBDD for f Ag or f Vg. Since the satisfiability of an OBDD 
is equivalent to the reachability of the accepting sink, OBDD’s are also easily tested for 
satisfiability and thus equivalence. We remark that although the synthesis operations 
of constructing OBDD’s for fAg and f Vg are intractable if the two given OBDD’s do not 
obey the same ordering (as shown below), this condition is not necessary for testing 
equivalence. There is a polynomial-time algorithm due to Fortune, Hopcroft, and 
Schmidt [FHS78] for testing whether an OBDD is equivalent to a read-once program, 


which can be used in this case. 


2.2.1 Read-once programs 


Read-once programs do not enjoy quite the same degree of manipulability as their 
oblivious version, OBDD’s. The read-once property implies that the program is satis- 
fiable exactly when there is a path from the source to the accepting sink. However, 
the synthesis operations are provably not tractable: there exist functions f and g that 
each have polynomial-size read-once programs but whose conjunction f A g requires 
exponential-size read-once programs. Such an example is the function 7-MATRIX of 
determining whether an n x n (0,1)-matrix is a permutation matrix—or equivalently, 
whether a bipartite graph on nodes V x W, where |V| = |W| = n, is exactly a per- 
fect matching (and no further edges). 7-MATRIX requires exponential-size read-once 
programs (see Section 2.3.3). On the other hand, it is easy to test that the each row 
has exactly one 1 or that each column has exactly one 1—in fact, these two func- 
tions are easily computed by OBDD’s (with different orderings of the variables)—and 
t-MATRIX is true exactly when both of these functions are true. 


It is not known how to determine the equivalence of two read-once programs in 
polynomial time. Blum, Chandra, and Wegman [BCW80] give a co-RP algorithm 
(that is, it may say “equivalent” when in fact the programs are not, but never vice 
versa) which relies on randomly assigning to the literals values from a finite field and 


then computing the value of the DNF polynomial of the function. 
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2.2.2 Nondeterministic read-once programs 


Obviously, OR nodes trivialize the synthesis operation of constructing a program for fV 
g. They do not help, however, with constructing fAg: the lower bound for 7-MATRIX 
is actually proved for read-once programs with OR nodes, so a nondeterministic read- 
once program for f A g may require size exponential in the sizes of the programs for f 
and g. This result may be contrasted with NL = co-NL, which says that polynomial- 
size branching programs with OR nodes are equivalent to polynomial-size branching 
programs with AND nodes. We now see that if we restrict the programs to be read- 
once, OR nodes and AND nodes give different computational power [KMW91]. The 


same phenomenon occurs for linear-length oblivious programs [KMW92]. 


Determining the satisfiability of a program with AND nodes is NP-complete by 
the example in Section 2.2.4. The case of OR nodes is trivially as least as hard as 
determining the satisfiability of a deterministic read-once program, which is not known 
to be in P. In the case of PARITY nodes, the algorithm of [BCW80] works as long 
as the field used has characteristic 2 [SDG94]. In [SDG94], simple but very restrictive 
conditions on the use of AND and OR gates are given so that the correctness of the 


algorithm of [BCW80] is retained. 


2.2.3 k-OBDD’s 


By restricting the order to be the same permutation repeated k times, we retain the 
property that two programs with obeying the same ordering are easily combined—the 


usual product construction works as before for OBDD’s. 


k-OBDD’s are also testable for satisfiability though with a little more effort. Regard 
the program as k separate segments corresponding to the & repetitions of the permu- 
tation in which the variables are read. If the size and hence the width is polynomial 
in n, then there are a polynomial number of nodes at the top of each segment. The 
portion of a segment between a particular top node and a “bottom” node (at the top of 
the subsequent segment) may be viewed as an OBDD. For an input to pass through a 
given sequence of k “top nodes” it must satisfy the conjunction of the & corresponding 


OBDD’s (with source and accept nodes defined appropriately). To test whether these 
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k OBDD’s are simultaneously satisfiable, we may construct an equivalent OBDD using 
the synthesis operation for OBDD’s (since these k OBDD’s obey the same ordering) and 
then check it for satisfiability. There are (poly)* = poly sequences of “top nodes” that 
an input may follow and the k-OBDD is satisfiable if one of these paths is satisfiable. 
Thus, to determine whether the k-OBDD is satisfiable, we sequentially check whether 


any of these sequences is traversable. 


Other operations on k-OBDD’s are considered in detail in [BSSW93}. 


2.2.4 k-IBDD’s and read-k-times programs 


Unlike for k-OBDD’s, testing the satisfiability of even 2-IBDD’s, and hence read-2-times 
programs, is NP-complete. The reduction, from SAT, places in sequence two OBDD’s, 
one that checks the satisfiability of the formula with each variable uniquely renamed, 
and another that checks whether the corresponding variables have the same value. 
Since it includes satisfiability as a special case, testing the equivalence of two k-IBDD’s 


is also hard. 


Since a 1-IBDD is simply an OBDD, the example 7-MATRIX implies that the 
synthesis operations on k-IBDD’s are intractable even for k = 1 if the constructed 
program must also be a k-IBDD. Naturally, the synthesis operations on a pair of 
k-IBDD’s are easy if we allow the constructed program to be a 2k-IBDD. The same 


statements are true for read-k-times programs. 


2.3. Previous lower bounds 


The restriction of limited reading is severe enough that in contrast to the case of 
arbitrary branching programs, many exponential lower bounds have been proved for 


explicit functions, some of the functions quite simple. 


2.3.1 For oblivious programs 


Exponential lower bounds for the size of OBDD’s are known for many functions, in 


particular the functions HWB (“Hidden-Weighted-Bit”), ACH (“Achilles-Heel”), and 


22 Related models 


integer multiplication, MULT, (all defined later), for which lower bounds were proved 
specifically for OBDD’s. Krause [Kr91] proves lower bounds for other functions. We 
will have more to say about these lower bounds in Sections 2.4.2, 2.5, and 2.6. Of 
course, all other lower bounds mentioned below for stronger models imply a fortiori 


equally strong lower bounds for OBDD’s. 


Also, in a very different vein, Alon and Maass [AM88] prove lower bounds for ar- 
bitrary oblivious programs of linear length, which do not obey any restriction on the 
number of times a variable is read. Their lower bound is discussed in Section 2.5.3. 
In similar spirit, Krause and Waack [KW91] show that any oblivious program of lin- 
ear length for the problem of directed s-t connectivity requires exponential size; in 
[KMW92], similar lower bounds are proved for such programs with nondeterminism 


added. 


Using a lemma from [AM88], and the communication complexity arguments out- 
lined in Section 2.5.1, Gergov [Ge94] proves that computing MULT requires size 2°(”) 
for arbitrary oblivious programs of linear length, even with nondeterministic AND, 


OR, or PARITY nodes. 


2.3.2 For read-once programs 


There has also been great success in proving lower bounds on the size of read-once 
programs. Many of the functions that require exponential size are very simple; some 


are easily computed with mere read-twice programs. 


Masek [Ma76] was the first to consider read-once programs, proving a lower bound 
of Q(m?) on the size of any program determining whether )>*_, 2; =m. Zak [Za84] and 
later Wegener [We88, We87] proved lower bounds of 2° for the function 5-CLIQUE 
of determining whether a graph on n nodes contains a clique of size n/2, and also for the 
function 5-CLIQUE-ONLY, of determining whether a graph on n nodes contains an 
n/2-clique and no further edges. (For comparison, there is a simple read-twice program 
for 2-CLIQUE-ONLY of size O(n3).) Dunne [Du85] proved a lower bound of 2°) for 
the problems of determining whether a graph on n nodes contains a hamiltonian cycle 
and determining whether it contains a perfect matching. Simon and Szegedy [5$93], 


in order to demonstrate their lower bound technique, proved a lower bound of 2°) 
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for the problem of determining whether a graph on n nodes is (n/2)-regular. Note 
that none of these bounds is fully exponential, since the number of input variables, 
one for each edge, is (5). Babai, Hajnal, Szemeredi and Turan [BHST87] proved an 
asymptotically optimal lower bound of 2°) for computing the parity of the number 
of triangles in a graph on n nodes; Simon and Szegedy [S593] simplify and refine their 


analysis, improving the constant in the exponent. 


2.3.3 For nondeterministic programs, read-once and read-k-times 


Exponential lower bounds for explicit functions have also been proved for nondetermin- 
istic read-once branching programs. Krause, Meinel, and Waack [KMW91] (see also 
[Ju89]) give a lower bound of n!/ (2!) = 2%) for the function 7-MATRIX. (It was 
known earlier that this function required exponential-size deterministic read-once pro- 
grams; see [Kr91, p. 10] and [Ju86].) Also, Borodin, Razborov and Smolensky [BRS93] 
prove a lower bound of 2°" for the functions 5-CLIQUE and 5-CLIQUE-ONLY. 
Note that the complement of $-CLIQUE-ONLY can be computed by nondeterministic 


read-once programs of polynomial size. 


Okolnishnikova [Ok91] proves that computing the characteristic function of the 
Bose-Chaudhuri codes requires deterministic read-k-times programs of size exponential 
in Q(/n/k*). Borodin et. al. [BRS93] exhibit for any k, a function that requires 
nondeterministic read-k-times programs of size exponential in Q(n/k4*). Jukna [Ju92] 
extends the results of [BRS93] and [Ok91] to show that the function from [Ok91] 
requires nondeterministic read-k-times programs of size exponential in Q(,/n/k?*) even 
though its complement can be computed by nondeterministic read-once programs of 


polynomial size. 


Also, in [MW95], lower bounds are proved for nondeterministic programs in which 
each path obeys a bound on the number of alternations between sets of variables. 
2.4 Comparing the models: classes and structural results 


In this section, we will compare the classes of functions that are computable by 


polynomial-size programs of the various types. We will use sans-serif font to denote the 
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class of functions computable in polynomial size by the named model. For instance, we 
will use OBDD to denote the class of functions computable by OBDD’s of polynomial 
size and READ-& for the functions computable by read-k-times programs of polynomial 


size. We will also need a notation for the union over all constants k: 


Definition 3 


C-OBDD = U k-OBDD 


keEN 
CIBDD = U k-IBDD 
keEN 
READ-C = U READ-& 
keEN 


(where C is for “constant”). 


We will use OBLIV-LINEAR to denote the class of functions computable with oblivious 


programs of linear length and polynomial size. Note that 
k-OBDD c C-OBDD c C-IBDD Cc OBLIV-LINEAR. 


The results presented in this section are summarized in Figure 2.1, which gives the 


inclusion relations of these various classes. 


2.4.1 Hierarchies in k 


It is known that the hierarchy over k of functions computable by k-OBDD’s of poly- 
nomial size is strict: k-OBDD ¢€ (k+1)-OBDD [BSSW95]. For the case k = 1, we 
may refer to the function HWB, described below, which is in 2-OBDD but not OBDD. 
For k-IBDD’s the hierarchy is also strict: k-IBDD ¢ (&+1)-IBDD [BSSW95]. These 
lower bounds are based on the well-known “rounds hierarchy” for communication com- 
plexity exhibited by the “k-pointer-chasing” function, k-PTR, on bipartite graphs 
[PS82, DGS84, Mc86, HR&88, NW91] (in particular the result of [NW91]). 


It is not known whether the corresponding hierarchy for read-k-times programs is 


strict, except for the case k = 1, where we have seen that s-MATRIX ¢ READ-1 
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but m-MATRIX € 2-IBDD C READ-2. Simon and Szegedy [5593] conjecture that 
the problem of testing the regularity of hypergraphs, which (for the case of ordinary 
graphs) they showed separates READ-1 from READ-2, will separate the levels of this 
hierarchy. We reconsider this question in Chapter 4. 


2.4.2 Comparing the classes across models 


OBDD © READ-1; OBDD © 2-OBDD 


It can be shown that the inclusion OBDD C READ-1 is proper—that is, the ordering 
restriction does in fact limit the computational power of read-once programs. Demon- 
strating this separation is the function HWB(z) (“Hidden-Weighted-Bit” ), which re- 
turns x; if there are 2 ones in x and 0 otherwise. HWB is computable in READ-1 by 
a clever algorithm that works its way in from the outermost bits of x; it is also easily 
computed in 2-OBDD. A standard lower bound argument shows that HwWB ¢ OBDD 
([Br91], see Section 2.5.1). 


Also, it is shown in [BHR95] (see also [BSSW93]) that ISA ¢ OBDD, where 
Isa(x,y) : {0,1}" x {0,1}'®" {0,1} is the “Indirect-Storage-Access” function which 
returns x;, where z is the integer represented by the y’th block of Ign bits of x if 
0<y<n/lIgn, and returns 0 if n/lgn < y <n. It is easy to see that ISA € READ-1 
and ISA € 2-OBDD. 


k-OBDD ¢ READ-1 for & > 1. 


Furthermore, the classes READ-1 and k-OBDD are incomparable (for any constant 
k > 1); their models may be thought of as orthogonal restrictions of read-k-times 
programs. 2-OBDD is separated from READ-1 by the function MHWB (“Multiple- 
Hidden-Weighted-Bit” ), defined on 3 n-bit vectors x, y, and z as 2Jy)4|-| PY|el4|2|P Z|e|+4|y| 
where |a| is the hamming weight of z and the sums are computed modulo n. MHWB 
has a natural read-twice algorithm where the variables may be read in order each time, 
so MHWB € 2-OBDD. In [BHR95], it is shown that MHWB ¢ READ-1. Krause 
[Kr91, Remark 5.3] gives a different function which separates 2-OBDD from READ-1. 
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Figure 2.1: The inclusion relations among the classes. “C—+D” means class C is con- 
tained in class D; inclusions that can be inferred by transitivity are not shown. Arrows 
labeled with problems denote proper inclusions, where the labeling problem separates the 
two classes. (Problems in parentheses denote separations that can be inferred from others. ) 


The separations denoted with dotted lines show that further inclusions do not hold. 
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This result is in a sense best possible since 2-OBDD Cc READ-2. In the other 


direction, we have 


READ-1 ¢ C-OBDD. 


In [BSSW93], an exponential lower bound is proved for the size of k-OBDD’s for the 
function ACH (“Achilles-Heel” ), defined on 2n + Ign Boolean variables as 


VV (ai Ay) if z =0 


1<j<n 


| (2iV vite) ibe £0 


1<j<n 


ACH(20,--- 5 Un—13 Yous ++) Yn—1) Z1y-++ 5 Zien) = 


where z is the integer represented in binary by 2... 2Zign and the sum j + z is computed 
modulo n. ACH is easily seen to be in READ-1, but a standard lower bound argument 


shows ACH is not in k-OBDD for any constant k [AGD91, BSSW93]. 


Krause [Kr91, Remark 5.4] gives a different function which separates C-OBDD from 
READ-1. 


The separation READ-1 ¢ C-OBDD is subsumed by the following result: 


READ-1 ¢ OBLIV-LINEAR. 


This very strong separation is shown using the powerful technique of Alon and Maass 
[AM88]. They exhibit a function SEQ of 4n bits th at is easily in READ-1, but cannot 
be computed by any oblivious program of length O(n) (see Section 2.5.3). This result 


exhibits most strongly how severe a computational restriction obliviousness is. 


This result is also best possible since OBLIV-LINEAR is the largest of our classes 
not containing READ-1. 


2-IBDD ¢ C-OBDD. 


Clearly, k-OBDD c k-IBDD for each k; conversely, however, 2-IBDD ¢ C-OBDD. 
Again, the separating function is 7-MATRIX: c-MATRIX € 2-IBDD easily, but 
m-MATRIX ¢ C-OBDD. This lower bound is claimed in [Kr91, Remark 5.5], but 
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to the best of our knowledge no proof has appeared, so we give one in Section 2.5.1 


(Theorem 1). 


This result indicates it really is a computational restriction to restrict the order to 


be the same permutation repeated k times rather than & different permutations. 


Finally, we mention that some functions that are provably outside these classes are 
easily contained in some of their nondeterministic counterparts. HWB, for example, 
while not in OBDD, is easily computed by a nondeterministic OBDD that initially 


branches into n different deterministic OBDD’s, all with a common ordering. 


2.5 Lower bound techniques 


In this section, we describe the techniques that have been used to prove lower bounds 
in the various oblivious models: OBDD’s, both deterministic and nondeterministic, 
k-OBDD’s and k-IBDD’s, and arbitrary oblivious programs of linear length. For com- 
pleteness as well as for demonstration, we supply a proof of Theorem 1, announced 
in [Kr91] without proof, and also prove Theorem 2, extending in a simple way the 
result of [BSSW93]. We compare these methods with lower bounds for non-oblivious 
programs, but defer a detailed description of the latter until the presentation of our 
own lower bound in Chapter 3. The technique of [BRS93] for proving lower bound 
for read-k-times programs will bemention only briefly in Chapter 4, when we outline 


approaches to some open problems. 


2.5.1 For OBDD’s, k-OBDD’s, and k-IBDD’s 


Lower bounds for OBDD’s follow a simple strategy: Show that for any Y C X of some 
fixed size (say m = m(n)), there are many (say 2°")) subfunctions on Y. If the first 
n—m variables read by an OBDD are Y, clearly any two assignments to Y that induce 
different subfunctions on Y must lead to different nodes. Since this lower bound holds 
for any set Y of size m, 2° is a lower bound on the number of nodes for any OBDD. 
Most lower bounds for OBDD’s show explicitly that there are many subfunctions by 
exhibiting for any Y of the stated size an exponential number of settings to Y such 


that for any two, there is a setting to Y on which the respective subfunctions differ. 
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This argument may be nicely interpreted in terms of communication complexity: 
one party gets the values of the bits in Y and the other party the bits in Y. The second 
party must compute the value of the function based on a single message sent by the 
first party. An OBDD gives a communication protocol where Y is the first m variables 
in its ordering. If the program has w nodes at the level immediately following the nodes 
of Y, then the message has lg w bits. Thus, if the one-way communication complexity 
is linear for every Y of size m, then the function requires OBDD’s of exponential 
size. Bryant [Br91] uses a simple argument of this form to prove that HWB requires 


exponential-size OBDD’s. 


Commonly, it is proved that in fact the unlimited-round, two-way communication 
complexity of the function is linear for any Y of size m. This argument is sometimes 
made in terms of what is called a “fooling set” for the function f with respect to Y. 
For Y C X, and z, 2x’ € {0,1}", let x, denote the value of x on the variables in Y, 
and let x,.x5, denote the n-bit input string equal to x on the variables in Y and equal 
to x’ on the variables in Y. A fooling set F’ C {0,1}" for f with respect to Y has 
the property that for alla # wv’ € F, f(x) = f(a’) = 1 and either f(a,a>) = 0 or 
f(x)2s) = 0. Thus, if an OBDD obeys an ordering in which the variables in Y are 
read first, the setting x, cannot lead to the same node at level m as the setting x‘, 
since either f(zya>) # f(zyr>) or f(z er >) =14 flvyas). If for every Y of size 
m there is a fooling set of exponential size, then the function requires exponential- 
size OBDD’s. Furthermore, the existence of a fooling set F' for Y implies that the 
(unrestricted) communication complexity with respect to the partition Y U Y is lg |F'. 
This is seen by inspecting the associated matrix M;; where 7 (resp., 7) ranges over 
all values of x, for x € F (resp., of a>) and My ale = f{(x,r>). Note that the 
definition of F’ implies that 2, # v\, andr, 4 v> for x #2’, so M is a square matrix 
of dimension |]. M has 1’s on its diagonal because f(x,.x>) = 1, and since either 
M,; = 0 or M;; = 0, no two l’s on the diagonal can appear in the same all-1’s minor. 
Since a communication protocol of 6 bits partitions the 1’s of the matrix into 2° all-1’s 


minors, the communication complexity is at least lg | F'|. 
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For k-OBDD’s 


If, for any set Y of size m, there exists a fooling set of size 2°(, then it is also easy 
to see that the function requires k-OBDD’s of size 2°(/?* [Kr91]. A k-OBDD gives a 
communication protocol of 2k rounds; the total communication is 2k lg( width), which 


must be at least O(n), giving the desired bound on the width and hence the size. 


For example, we give a simple proof that -MATRIX ¢ k-OBDD for any constant k. 


Theorem 1 7-MATRIX ¢ C-OBDD 


Proof: We will show that for any partition of the n? variables into two sets X and X 
of equal size, 2° is a lower bound on the rank of the matrix of the communication 


complexity game where player I gets X and player II gets X. 


First notice that for certain partitions, the proof is easy. For example, consider the 
partition where player I gets the variables in rows 1,...,n/2 and player I gets the 
variables in rows n/2+1,... ,n. We may even restrict our attention to only those inputs 
where each row has exactly one 1 and each player gets exactly n/2 1’s. The inputs to 
the two players then correspond merely to subsets of the columns; the players accept if 
the subsets are disjoint and reject otherwise. It is easy to see that this problem requires 
lg (79) bits of communication, since the (17) -by- (79) matrix of the communication 


game is diagonal. 


Our proof will follow the spirit of this strategy for arbitrary partitions. Let r; be 
the number of X-variables in row 7. Order the rows so that ry; < rg <+-: <r,. We 


have |X| = So. r; = n?/2. Let rows n/241,...,n be the “top half” of the matrix. 


First consider the case that the top half contains at least 3/4 of the X-variables: 
Vieni li 2 an In this case, at least 2/3 of the columns have at least n/8 X- 
variables in their top halves: otherwise, the number of X variables in the top half is 


less than 


a contradiction. Since the top half contains exactly half of all the variables, the “bottom 


half” (rows 1,...,n/2) has at least 3/4 of the variables in X. It follows that at least 
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2/3 of the columns have at least n/8 X-variables in their bottom halves. Therefore 
at least n/3 columns contain at least n/8 X-variables in the top half and at least n/8 


X-variables in the bottom half. Let C be any subset of n/4 of these columns. 


For any subset C” of half the columns of C, there is a setting to X in which exactly 
one X-variable in the top half of each column of C’ is 1 and each such 1 appears in 
a different row. This is because |C’| = n/8 and there are at least n/8 X-variables in 
the top half of each column of C. Let us restrict attention to particular settings to 
the variables in C. On X, these settings shall be as described above (for some C’) in 
the top half, and shall be 0 in the bottom half. On X, these settings shall be 0 in the 
top half, and in the bottom half shall contain 1’s in n/8 different rows and different 


columns C”. 


If these two subsets C’ and C” of columns are complementary (C’ UC” = C), 
then there is a setting to the remaining variables for which the input is a permutation 
matrix, making the function 1. If these two subsets of columns are not complementary 
(C’UC”" © C), some column in C' contains both a | in its top half and a 1 in its 
bottom half, so that for all settings to the remaining variables, the function is 0. We 
partition these settings to X (1’s inputs) into ("/s) blocks, according to which subset 
of C contains the l’s in X. Similarly, we partition the settings to X (II’s inputs). 
Thus the communication complexity matrix associated with these inputs is comprised 
of (n/a) minors, and only the minors on the diagonal contain 1’s. This matrix clearly 
has rank at least ("/8) = 2%"), 

Now consider the case that iene re< am In this case, the bottom half has at 


least ine X-variables, implying that 


Since r; > Pp/2 for 2 > n/2, it follows that there are at least 4n/10 rows in the top half 
with at most 7n/8 X-variables: otherwise, there are more than 

4nin nn 3n? 

os 1r 8 
X-variables in the top half, a contradiction. Let R be these 2n/5 rows, each containing 


at least n/4 and at most 7n/8 X-variables. 
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Let C¥ be columns l,... ,n/2 (the “left half”) and C” be columns n/2+1,...,n 
(the “right half”). Since each row in R has either more X-variables in CY or C®, at 
least half of the rows in R have most of their X-variables in one, say C’. Each of these 
n/5 rows has at least n/8 X-variables in the left half and at most 7n/16 X-variables 
in the right half. Alternatively, each of these rows has at least n/8 X-variables in the 
left half and at least n/16 X-variables in the right half. 


We now fix some n/8 of these rows and the rest of the proof proceeds as in the first 


case, yielding a lower bound of (es) = 22(n) | = 


It is easy to see that 7-MATRIX has OBDD’s of size O(n2"): the variables are 
read column-wise, easily ensuring that each column has exactly one 1; furthermore, 
the OBDD keeps track of the subset of the rows in which 1’s have appeared, requiring 
width O(2"). Interestingly, for k-OBDD’s, just as the lower bound degrades roughly 
by a factor of k in the exponent, yielding 2°(’/*), similarly the upper bound can be 
improved by a factor of k in the exponent. Construct a k-OBDD of width 2%/*) 
by reading the variables column-wise, but keeping track only of n/k rows at a time: 
Partition the rows into k sets of size n/k each, and in segment 7 = 1,... ,k, keep track 
of the subset of the 7th set of rows in which 1’s have appeared. Accept only if in each 


segment, each of the z rows is found to contain exactly one 1. 


For k-IBDD’s 


The only lower bound that is proved specifically for IBDD’s (i.e., which does not apply 
to linear length oblivious programs more generally) is the lower bound of [BSSW95]. 
They reduce the problem to one of communication complexity in the following manner. 
Given an IBDD, they construct two disjoint subsets of the variables by considering the 
levels of the IBDD one at atime. Each level disqualifies at most one-half of the variables 
in each set, so that after a constant number of levels, still a constant fraction 27* of 
the variables are retained. They argue that the problem restricted to these variables 
is a smaller version of the original problem, and hence the known linear lower bound 


on the communication complexity applies. 


To demonstrate, we give an easy lower bound which has not appeared in the liter- 


ature. The proof is very similar to the lower bounds of [BSSW95] and [Ge94]. Recall 
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that [BSSW93] showed ACH ¢ C-OBDD; we will show that ACH ¢ C-IBDD. 
Theorem 2 ACH ¢ C-IBDD. 


Proof: Consider a k-IBDD G computing ACH. We will show that G has size at least 
2r/k2" Recall from Section 2.4.2 that 


ACH(z,y,z) = NMr<jen(@i V yjtz) if 2 #0, and Vi<jen(@i Ayj) itz = 0. 


We think of G as being composed of k segments, each with n levels corresponding 
to a permutation of the variables. Suppose we could show that for some z there are 


subsets of variables 
Xs = {x,:2 ES} CX and Ys = {ye 2 ES} CY 


of size at least n/2?* such that for each segment of G, either all variables of Xs appear 
before Ys or vice-versa. Then we may invoke the communication complexity argument 
in which the players get 2k rounds or fewer. If z > 0, we get a fooling set of size 
2lXsl with respect to Xs by taking inputs ranging over all settings to Xs and where 
Yite = «; = 1 for? ¢ S and y;4. = F for 2 € S. For each such input w = (2, y, 2), 
we have ACH(w) = 1, but for two different such inputs, w 4 w’, we have either 
ACH(wy wy) = 0 or ACH(w', wy) = 0. Similarly, if z = 0, letting y;, = 7; = 0 
fora ZS and y;4- = 7 for 7 € S, for each such input w = (x,y,z), we have ACH(w) = 
0, but for two different such inputs w 4 w’ we have either ACH(w, wy) = lor 


ACH(w'y wz) = |. If we can find such a z, Xs and Ys for any given G, then 


the communication complexity argument implies that the width of G is exponential 


in (n/2?*)2k. 


We now show that there exist z, Xs, and Ys as desired. Without loss of generality, 
suppose the first half of the first segment of G has more X variables than Y variables. 
Let X, C X appear in the first half and Y; C Y appear in the second half so that 
|X1| = |¥i| > n/2. Now partition the second segment of G in “half” with respect to the 
n variables X, U Y; only. If the first half contains more X variables than Y variables, 
let Xz C X, appear in the first half and Y2 C Y, appear in the second half, so that 
|X2| = |¥o| > n/4. Otherwise, let X, C Xy appear in the second half and ¥, c Yi 
appear in the first half. Repeating this process for the k segments, we finally obtain 
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X, and Y; of size n/2* with the desired alternation property. Since X;, x Y; has size 
at least n?/2?*, it contains at least n/2?* pairs (x;,y;4.) for some value of z between 0 


and n — 1. The x; in these pairs constitute X¢. | 


Note that 7-MATRIX does not enjoy the same self-reducibility property: the above 
proof applied to the 2-IBDD for computing 7-MATRIX finds X>2 equal to the variables 
in one quadrant of the matrix and Y2 equal to the variables in the diagonally opposite 
quadrant. Indeed, for any setting to the remaining variables, only one bit of communi- 
cation between the players is necessary to compute the function: player I checks that 
the top rows and left columns are okay, and player II checks that the bottom rows and 


right columns are okay. 


2.5.2 For nondeterministic BDD’s 


Lower bounds for nondeterministic BDD’s also follow from the existence of exponential- 
size fooling sets: they imply that the function requires exponential-size nondetermin- 


istic OBDD’s, when OR gates” or PARITY gates are allowed. 


For example, consider an OBDD with OR nodes. We may view the corresponding 
communication protocol as containing nondeterministic choices by the players, giving 
in effect the OR of many deterministic protocols. Each such deterministic protocol 
determines some l|-rectangles (all-1’s minors); together, the l-rectangles of all the pro- 
tocols must cover all the 1’s of the matrix without covering any of the 0’s. Thus the 
communication required is at least the logarithm of the “cover number” (the number 
of l-rectangles needed), or equivalently, the logarithm of the rank? over the Boolean 
semiring B ({0,1} with A and V; it is a semiring because 1 has no additive inverse). 
As discussed earlier, the matrix corresponding to a fooling set of size |F'| has all 1’s 
on its diagonal, no two of which may appear in the same all 1’s minor, so the cover 
number, or the rank over B, is |F'. 


?The asymmetry with respect to OR/AND occurs because of the choice f(x) = 1 rather than 


f(«) = 0 in the definition of fooling sets. 
3The rank of a matrix over a semiring is the fewest number of pairs of (column) vectors (v, w) such 


that M = >°, vw? . This specializes to the “cover number” in the case of B and to the dimension of 


the column space in the case of a field. 
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Similarly, with PARITY nodes, the communication required in the corresponding 
communication game is the logarithm of the rank of the matrix over GF(2). Since 
column operations make the matrix lower triangular, it has full rank over GF(2) as 


well. 


With AND nodes we have the dual of OR nodes: the communication complexity 
is equal to the nondeterministic communication complexity of the complement of the 
function, or the rank over B of the matrix with 0’s and 1’s reversed. Note that for a 
particular partition of the variables, this may be exponentially less than the case of 
OR nodes: the function EQUAL?(x,y) with respect to the partition X U Y requires 
nondeterministic complexity |z| = |y| whereas its complement has nondeterministic 


complexity 2 lg |]. 


2.5.3 For arbitrary oblivious programs 


Alon and Maass [AM88] prove strong lower bounds for arbitrary 3-way oblivious pro- 
grams by analyzing the sequence S$ in which the variables are read by the levels of 
the program. In particular, for any two disjoint subsets of variables S and 7’, they 
consider the number of times this sequence alternates between reading variables of S$ 
and variables of 7’. They prove a theorem that says if for every two subsets S$ C X and 
T CY with |S] = |7| = n/2” (where |X| = |Y| = n) there are at least m alternations 
between S and 7, then the sequence must be of length at least Q(nm). 


They use this theorem to prove a superlinear lower bound on the length of oblivious 
branching programs for the “sequence equality function” SEQ, defined on two ternary 
vectors x and y of length n where each x; and y; may be 0, 1, or 2. SEQ(z,y) = 1 
if the subsequence of x obtained by removing the 2’s is equal to the subsequence of 
y obtained in the same manner. A standard “cut-and-paste” (or “crossing-sequence” ) 
argument shows that in any 3-way branching program* for SEQ and for any S and T 
as above, the number @ of alternations between S and JT must satisfy w’ > 2/5! where 
w is the width of the program. So for w = gr/2™" this yields € > 2”. In particular 


€ > m, and so the theorem gives a lower bound of Q(nm) on the length of the program. 


4This is a branching program in which each node has 3 edges leaving it. 
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Thus any oblivious program for SEQ of size 2° must have superlinear length. 
This lower bound for 3-way programs clearly implies the same lower bound for ordi- 
nary branching programs, where each ternary variable x; is represented by two binary 
variables. This implies, for instance, that SEQ ¢ C-IBDD. For comparison, SEQ has 


very easy read-once programs, which are non-oblivious, of length n. 


In [KMW92], this lower bound for SEQ is extended to nondeterministic oblivious 
programs of linear length. At the same time, a simple co-nondeterministic oblivious 
program (with AND nodes) of linear length is given, showing that as for read-once 
programs (Section 2.2.2), the two types of nondeterminism give different computational 


power. 


Babai, Nisan, and Szegedy [BNS92] in the same spirit improve this length/width 
tradeoff, using their lower bound for multiparty communication complexity to raise 
the lower bound on the length of polynomial-size oblivious programs (for a different 


function) by a factor of Ign. 


2.5.4 For read-once programs 


Note first that the lower bound method for OBDD’s is insufficient for read-once pro- 
grams. Even though there may be many subfunctions arising from the settings to any 
Y Cc X of a given size, it may also be that for each Y there is one subfunction that 
arises from many of the settings to Y. Since different paths may read the variables of 
X in different orders, different sets Y’ may be the “first” ones read depending upon 
the values of the variables. In this case, we have not excluded the possibility that the 
first m input bits are read in such a way that the program needs nodes for only the 


“large” subfunctions on the various Y of size n —™m. 


For example, we saw that for the function ACH, there is a fooling set of size 2/4 
for any subset of half the X and Y variables (Theorem 2, specialized to OBDD’s). 
However, there is a simple read-once program that reads the z variables first and then 
reads the X and Y variables in the appropriate order, pair by pair. Looking closely 
at this program, we see that there are n different subsets of half the X,Y variables 
that may be read first. For each subset there is a large fooling set, implying that there 


are many possible subfunctions on the remaining variables. However, the values of 
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the variables (specifically, of the z variables) that give rise to these many subfunctions 
cause other paths to be taken through the program. For the path that leads to a given 
subset of X,Y, there are only two subfunctions (either the 0 function or the induced 
ACH function) arising from the many settings to the variables (in 7 and the rest of 


X,Y) read so far. 


In order to prove lower bounds for read-once programs, we must show that not 
only are there many subfunctions, but that each arises in very few ways. Simon and 
Szegedy [5$93] distill this idea into a lemma which may be considered a paradigm 
for proving read-once lower bounds. This technique appears implicitly in the read- 
once lower bounds of [We88, Za84] and explicitly in those of [Ju88, Kr88, Du85]; the 
generalization in [S593] enables an easier proof of the lower bound of [BHST87] and 
others [We87, Du85, Ju88]. Simon and Szegedy use this technique to reprove a theorem 
of Babai et. al. [BHST87], that read-once programs require size 2%") to count modulo 
2 the number of triangles in an n-node graph. They also give a simple proof that size 
2”/19 is required to tell whether an n-node graph is 5-regular. Since the lemma is a 


central part of our lower bound for multiplication, we provide a proof in Chapter 3. 


2.6 Integer multiplication 


By integer multiplication, we will refer to the Boolean function MULT : {0,1}?" = 
{0,1} that computes the middle bit in the product of two n-bit integers. That is, 
MULT(2,y) = Zn-1 where © = &p-1°++ Xo, Y = Yn-1°°* Yo, aNd Zan-1°++ Zo = % = LY 
is the product of the integers represented in binary by x and y. The middle bit is the 
“hardest” bit, in the sense that if it can be computed by read-once branching programs 
(or most any computational model) of size s(n), then any other bit can be computed 


with size at most s(2n). 


2.6.1. Bryant’s lower bound 


Bryant [Br91] gives the following lower bound for MULT; Gergov [Ge94] notices that 
the proof holds also for nondeterministic OBDD’s, as noted the end of the proof below. 
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Theorem 3 MULT ¢ OBDD. 


Proof: We will show that with respect to any subset S C {x1,...,an} of size n/2 
(corresponding to the first n/2 variables of X read by an OBDD), MULT has a fooling 
set of size 2”/*. The elements of the fooling set differ only in their settings to the x;; the 


y; are fixed so that the multiplication is reduced to computing the sum of two integers, 


one corresponding to a subsequence of x1,...,%n/2 and the other corresponding to a 
subsequence of 2, /241,-.-, tn. The nth bit of the product is the high-order bit in this 
sum. 


Choose these two subsequences so that for each 2, the 7th bit of one is in S and the 


ith bit of the other is in S, and they are equally far apart in x for all i. To do this, let 
S,=S0 {x1,... ,2n/2} and Sp=SN {@nf2$1) +++ vy} 
and similarly define S; and Sp for S. It is easy to show that 
Sz x Sri t+ [Sz x Sp] > n?/8 


and since 1 < |x; —.2x;| <n for each (2;,2;) € (Sz x Sp) U(Sz x Sp), we see that there 


is a subset of size n/8 with the desired property. 


Exactly two bits of Y are set to 1 in such a way that these two subsequences “line 
up” and so that the carry out of their high-order bit corresponds to the nth bit in the 
product of x and y. The bits of X not contained in either subsequence are set to 0 
unless they are in {@n/2415 Lee , xy} and lie “in between” the bits of the subsequence. 
This causes carry bits of the addition to propagate as desired and thereby reduce 
the multiplication of x and y to the addition of the two integers determined by the 


subsequences. See Figure 2.2. 


We may think of the addition of these two integers as the addition of an integer 
determined by the setting to S$ and an integer determined by the setting to S. The 
fooling set ranges over all settings to the integer determined by S. Each of these two 
integers may take on any value between 0 and 2"/* — 1, in turn making the nth bit of 


the product is 1 if their sum is at least 2”/8 and 0 otherwise. 


The corresponding matrix has rows indexed by all 2”/® settings to S’s integer and 


columns indexed by all 2”/® settings to $’s integer. After deleting the 0-column and 
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MULT(«, y) 


Figure 2.2: The multiplication of x and y is reduced to computing the carry bit 
in the sum of the integers represented by the two subsequences 22 ;2; 
and xp, v,. For each corresponding pair (2;, 2p), (Xj, %q), and (xz, 2%,), 
one variable is in S and the other in S. The fooling set ranges over all 
settings to the variables in 5S, with each variable’s “partner” getting the 
complementary setting. The remaining variables are set as shown in order 


to achieve the desired reduction. 


Q-row and indexing appropriately, this matrix is lower-triangular with all 1’s in the 
lower half. It thus has full rank over B and over GF(2), and so does its complement. 
It follows that MULT requires exponential-size OBDD’s and k-OBDD’s even if OR, 
PARITY, or AND nodes are present. = 


Gergov [Ge94] further generalizes Bryant’s lower bound for MULT to arbitrary 
oblivious programs of linear length by using the main lemma from [AM88]. For any 
program of length kn, the lemma implies the existence of two “large” disjoint subsets 
of X (size n/k2?*) such that there are few (O(k)) levels where the program changes 
from reading variables of one set to reading variables of the other set. Now reduced to 
a problem of communication complexity with 2k rounds, it is easy to carry though the 
rest of Bryant’s proof to find a fooling set of size gn kee Thus, the program has size 
at least 2”/2*°2"" As reasoned above, this bound holds even if nondeterministic nodes 


are present. 
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2.6.2 The decision problem DMULT—the graph of multiplication 


Although it is not directly related to the issue of verification, another Boolean function 
that has been considered is the decision problem DMULT of recognizing the graph of 
multiplication. That is, DMULT(«,y,z) = 1 if ey = z. Note that it is not readily 
apparent which problem is “harder”, MULT or DMULT. On the one hand, DMULT 
seems to require practically computing all the bits of xy; however, an algorithm for 
DMULT has the advantage of inspecting all the bits of z, the putative product. Buss 
[Bu92] proves that DMULT ¢ AC® by reducing it to counting the number of 1’s in the 
input (and therefore to MULT and to PARITY by results of [CSV84]); for comparison, 
[FSS84] gives an easy reduction of MULT to PARITY to show MULT ¢ AC®. 


A simple argument [We94] shows that computing DMULT with read-once programs 
is as hard as factoring. Given a polynomial-size read-once program for DMULT and 
any integer n, the following procedure will either factor n or determine that it is prime. 
First instantiate n as the bits of z in the read-once program where |z| = 2lgn and 
|x| = |y| = Ign. There is a satisfying assignment to the remaining input bits since 
lz = z. Now attempt to construct a nontrivial factor by instantiating the bits of x 
one at a time, maintaining the satisfiability of the program after each bit. If the only 
successful instantiations for x are 1 and z, then z is prime; otherwise, a nontrivial 
factor is determined. Since we can test the satisfiability of a read-once program in 
polynomial time, the entire procedure can be executed in polynomial time. 


47k" for DMULT on non-deterministic 


Jukna [Ju94] proves a lower bound of 2” 
read-k-times branching programs. His lower bound follows the framework of [BRS93], 
and gives a simple reduction of DMULT to the problem of recognizing codewords of a 


linear code, for which a lower bound of 2¥”/**" is proved in [Ju92]. 


2.7 Related issues 
2.7.1 The ordering problem for OBDD’s 


When using OBDD’s for verification, it is naturally desired to minimize their size. 


For a given function, the order in which the variables are read greatly affects the 
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number of nodes required—it is easy to exhibit functions which have small OBDD’s 
for good orderings but require exponential size for poor orderings. Thus, an important 
and interesting question is how to determine the ordering that minimizes size for a 
given function. The decision problem is: Given an OBDD and an integer k, determine 
whether there is an OBDD (possibly obeying a different ordering of the variables) 
with fewer than & nodes that computes the same function. This problem was recently 
proved to be NP-complete in [BW95], extending the work of [BW95, THY93], via a 
nice reduction to OPTIMAL-LINEAR-ARRANGEMENT [GJ79]. 


It would be useful to find an efficient algorithm to determine an approximately 
optimal ordering. Many heuristics for improving an ordering can be found in the 
literature (see [BW95]). It is worth mentioning that the use of randomization has 
not been explored, either in helping to determine good variable orderings or in the 


verification strategy more generally. 


2.7.2 The Fourier spectrum 


The Fourier spectrum of Boolean functions has been widely studied over the past few 
years. Properties of the Fourier spectrum have been used in a variety of applications, 
perhaps most strikingly in deriving efficient algorithms for learning (e.g., [KM91]). 
Two properties of the spectrum that have proven useful for this purpose are small 
[,-norm (that is, the sum of the absolute values of the coefficients) and a knowledge 
of which coefficients are the largest. For example, [KM91] gives an efficient algorithm 


for functions whose spectrum is either sparse or has polynomial £,-norm. 


It is easy to show that the £y-norm of a function is bounded by the number of 
leaves in any decision tree for that function, even if the nodes may query the parity 
of arbitrary subsets of the variables. And [LMN89] proves that functions in AC° have 
most of the weight of their spectrum in the coefficients of small sets. These results 
are used to derive efficient learning algorithms for functions in AC® and functions with 


shallow decision trees. 


Since OBDD’s are such a constrained model of computation, perhaps interesting and 
useful properties can be derived about the spectrum of the functions in OBDD they 


compute. Some negative results are known: Bruck and Smolensky [B590] demonstrate 
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a function in AC® that has exponential L£,-norm; this function is easily computed 
by polynomial-size OBDD’s. They also exhibit a function (inner product modulo 2), 
also easily computed by an OBDD, whose transform has £,,-norm less than 1/2lee™ m 
This is an an even stronger result and further implies that any polynomial p(a1,... , an) 
whose sign represents this function (i.e., whose is negative exactly when the function 


glog\) n 


is 1) must have non-zero coefficients. 


Comparing OBDD’s with constant-depth circuits, we note that PARITY, though 
not in AC®, is easily computed by small OBDD’s, while 7-MATRIX is easily in AC° 


but requires exponential-size OBDD’s. 


2.7.3. Read-once programs and resolution proofs 


If we consider branching programs for computing multi-valued functions, we may find 


a nice correspondence with resolution proofs. 


A resolution proof for a CNF formula ¢ is a straightline program for proving that @ 
is not satisfiable. At each step, two previously obtained clauses, (2; V a) and (7 V 2), 
are “resolved on x;” to obtain a new clause (a V 2) which is satisfiable if the previous 
clauses are (a and £ are disjunctions of literals). The proof is complete when the empty 
clause is obtained. Such a proof is naturally viewed as a directed acyclic graph where 
the clauses correspond to the nodes of the graph: the original clauses of @ are “input” 
nodes with indegree 0, the newly obtained clauses are “internal” nodes with indegree 2, 
and the empty clause is the “output” node with outdegree 0. Such a resolution proof 
is called regular if on every directed path from an input node to an output node, each 


variable is resolved at most once. 


We may consider a branching program for an unsatisfiable CNF formula ¢ that 
solves the following “search” problem: given an assignment x, find a clause of ¢ that 
is not satisfied. It is an observation of Chvatal and Szemeredi (see [LNNW95]) that 
read-once programs for this problem are isomorphic to regular resolution proofs. Taken 
together with the fact that a decision tree is a read-once branching program, [LNNW95] 
notes that D(¢) > lg RRES(¢@), where D(@) is the depth of the shallowest decision 
tree for this search problem and RRES(¢@) is the fewest number of steps in a regular 


resolution proof of @¢. 
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In general, an arbitrary resolution proof for @ yields a branching program for this 
search problem, but not vice-versa: in fact, there are formulas for which RES(¢@) is 
exponential [CS88, e.g.], even though there is always a branching program of size O(|¢]) 


for the search problem. 
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CHAPTER 3 


A lower bound for multiplication 
with read-once programs 


This chapter describes a lower bound of 2°(V” on the size of read-once branching 
programs for the function MULT. This is the first superpolynomial lower bound for 
multiplication on non-oblivious branching programs. This result demonstrates that 
relaxing the ordering restriction of OBDD’s is insufficient to gain the computational 


power desired for the purpose of hardware verification. 


The lower bound for multiplication is motivated by the work of Simon and Szegedy 
[S593], who give a basic lemma for proving lower bounds on the size of read-once 
branching programs. The lemma involves Neciporuk’s method of counting the subfunc- 
tions that are possible when some subset of input bits is fixed. We begin by describing 
this lemma in Section 3.1. For ease of presentation we first prove a lower bound of 


2%") in Section 3.2, and then extend the proof to achieve 2°”) in Section 3.3. 


In Section 3.4, we define the notion of read-once reductions in order to deduce 


similar lower bounds for other arithmetic functions. 


3.1 A paradigm for read-once lower bounds 


Let f be a Boolean function, f : {0,1}" — {0,1}, and let X = {xo,..., 2-1} be its 
n binary input variables. Let F be a filter on X. (That is, F C 2* and F is closed 
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upward—if S € F, then all supersets of S are in F.) A subset B C X is said to be 
in the boundary of F if B ¢ F but (BU x;) € F for some z;. By setting the values of 
B= X \ B, we naturally induce a function on B. The lemma is stated below in the 


form we will need it; it appears in [$593] in slightly more generalized form. 


Lemma 1 (Simon and Szegedy) [f for any B in the boundary of F, at most 2IFl/ TL. 
settings to B induce the same subfunction on B, then any read-once branching program 


computing f has size at least L. 


For completeness, we now provide a proof of this lemma. 


Proof: The idea is to identify a “frontier” of edges in the branching program—a cut 
containing exactly one edge from each source-to-sink path—in which every edge allows 
only a fraction 1/L of the inputs in {0,1}" to pass through it. Since the path of every 
input passes through some frontier edge, there must be at least LZ such edges. Having 
fan-out 2 and only one root, the program also has at least L nodes. This is because 
if the endvertices of the frontier edges were distinct, they would be the leaves of an 
embedded binary tree which must contain £ — 1 distinct internal nodes. Since the two 


sinks are not among these internal nodes, there are at least [+1 nodes in the program. 


In order to characterize a frontier, we first associate with each node of the program 
the set of variables appearing in the subprogram rooted there—that is, those variables 
appearing on nodes that are reachable from the given node. Clearly, along any path 
through the program, the variable-sets of later nodes are subsets of the variable-sets 
of earlier nodes. A frontier consists of those edges going from nodes with “large” sets 
of variables to nodes with “small” sets. “Large” sets are defined to be those that are 
in the filter #. Clearly there is exactly one frontier edge on each source-to-sink path, 
as (for nontrivial filters F) the root has the variable-set X € F and the sinks have the 
variable-set @ ¢ Ff. With each frontier edge we associate a set B C X in the boundary 
of F. 


Suppose boundary set B is associated with a given frontier edge. Because the 
program is read-once, these variables do not appear on any path from the root to this 
edge. In fact, the inputs x € {0,1}" that reach this edge are characterized exactly by 
their settings to B. Each setting to B that reaches this edge clearly induces the same 
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subfunction on B, as defined by the subprogram rooted there. Since at most QI IL 
settings to B give the same subfunction on B, at most (2/71/L) - 2/61 = 2"/L inputs in 
{0,1}" may pass through this frontier edge. The lower bound of L then follows. = 


3.2 A lower bound of 2°”) 


Theorem 4 Any read-once branching program for MULT has size 24”, 


Proof: Let m = </n/4 and let X and Y denote the sets of variables X = {2o,... ,2n-1} 
and Y = {yo,--- ,Yn-1}. Define the filter 


FH{VC(XUY):|VOAX| >n-—mand |VAY|>n—m}. 


Roughly speaking this filter marks the frontier of the program where at most m bits 


of X and at most m bits of Y have been read.! 


We will show that for any B in the boundary of F, at most 2/B|-m settings to B give 
the same subfunction on B. By Lemma 1, this gives the desired lower bound of 2”. Fix 
any B in the boundary of F and let S = B. Think of S as being the variables already 
read by the branching program. Since B is in the boundary of F, either |S X| = m or 
ISA Y| =m. We will show that there is a subset 5S” C S' of size at least m such that if 
two settings to S differ on S’ then they induce different subfunctions on S = B. Thus 
at most 2/5I-™ settings to S = B induce the same subfunction on S = B, as desired. 
We will show that the two subfunctions are different by explicitly demonstrating a 


single setting to the bits of S where the induced subfunctions of MULT differ. 


Suppose without loss of generality that |S. X| = m (and |SMY| < m). Let 
7 € {0,...,n—1} be the smallest index such that y; ¢ S. Let 


S" = {yo,--- ,Yi-r1f U (s MN {xo,... ,tn1-i}) . 


Note that because {yo,... ,yi-1} CS and |S A X| = m, we have |S’| > m. 


‘In order for this notion to be strictly correct, “have been read” must be interpreted to mean 


“appear on any path from the root”. 
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Let us adopt the following notation for the integers obtained from partial settings 
to the variables. For a setting a to W C XUY (ie.,a:W — {0,1}), let x, denote the 
integer that is represented in binary when the variables of X 1 W have the value given 
by a and the variables of X M W are each 0. Define y, similarly. For a single variable 
z & W, let “a + z” denote the setting to W U {z} that further sets z = 1. For two 
settings a and 7 to disjoint subsets W and V, let “a Ur” denote the setting equal to 
aon W and to 7 on V. Finally, let (x); denote the 7th bit in the binary representation 
of integer a, sow = SW} (a); 2". 

Let a and ( be two settings to S that differ on some bit in S’. Our goal is thus to 
find a setting 7 to the bits of S so that (@aurYour)n—-1 # (@purYsur)n—1- 


We proceed in two stages, according to Lemmas 2 and 3. First we ensure, by 
setting to 1 (if necessary) a single variable z of S, that the two products %o4-Yo42 and 
te42¥e4- differ in a “high-order” bit—a bit position in the range [n — m — 3,n — 1] 
(we aren’t concerned with higher bit positions). In the second stage, we set to | a pair 
of variables of S, one in X and one in Y, so that the resulting product differs in a 
higher high-order bit position. We iterate this second stage, repeatedly setting a pair 
of variables until the resulting products differ in bit position n — 1. It follows that a 
and £ induce different subfunctions on S—the subfunctions differ when S has z and 


the pairs from the second stage all set to 1 and the remaining bits of S set. to 0. 


Lemma 2 /f for all i € [np —m —3,n — 1] we have (aYa)i = (xeyg)i, then there is a 
single variable z € S such that 


(LapeYotz)i x (Up42Ye+42)i 


for some i € [np —m—3,n— 1]. 


Lemma 3 Let TC XUY, anda and f be two settings to T. Let d be the greatest index 
in [0,n—2] such that (tayo) # (teys),. [fd > n—m—3 and max(|TN X|,|TOY]) = 
t <3m, then there are two variables, x, € XAT andy, € YT, such that 


(LorYor aga F (erYe") ay 


where of =A+2,4+ yy and B= BP+ayty. 
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Theorem 4 follows from these lemmas as outlined above. Notice that Lemma 3 is 
first applied with t <m-+1, and since we must apply Lemma 3 at most m+ 3 times, 
each time setting one more variable of X and Y, we maintain t < 2m +4 < 3m as 


required. ia 


We now give the proofs of Lemmas 2 and 3. 


Proof of Lemma 2: The settings a and ' differ on S’ C S; suppose first that they 
differ in a bit of S’OX. 


Loty, Yatyr, 


Figure 3.1: The integers modulo 2”. In order for xg4y,¥a4y, aNd Toty,Ya+y, to fall 
into different segments, we must choose k so that 2°(x,, — xg) has large 


magnitude. 


The proof is most easily explained by picturing the integers modulo 2” on a circle. 
Partition the circle into 2+? equal-sized segments according to the values of the m+3 


highest bits, so each segment contains 2°77? 


consecutive integers, as depicted in 
Figure 3.1. The hypothesis of the lemma is that z,y, and xgyg fall into the same 
segment. If we set bit y; € Sy to 1, we obtain the products La = LoYottq2* 
and xg1y,¥e+y, = La¥e + 7e2". The product ro+y,Yo+y, is obtained by a translation 
of 2*x, along the circle from xoyo, and rg4y,¥s+y, is obtained by a translation of 


2* x, from xeyg. If, modulo 2”, their difference 2*(x2, — xg) is at least 2"-"~?, or two 
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—2"-™-2" or “negative two” segments long, then it 


segments long, and at most 2” 
is clear that the translates to4y,Yo+y, and %p+y,Ye+y, fall into different segments. It 


follows that the products to4y,Yo+y, and Tey, ¥e+y, differ in a high-order bit position. 


It only remains to show how to choose yz € SMY so that 2°77? < 2®(x, — 2g) < 
2” — 2"—™—2 modulo 2”. Let F = x. — xg. It is useful now to think in terms of the 
table generated by the usual grade-school algorithm for multiplying F by y, as shown 
in Figure 3.2. 


Kw ® j 
Re --1LO000000=7 


Figure 3.2: The table generated by the grade-school algorithm for multiplying 7 = 
Lq — xg by y. We choose a bit y;, to set to 1 so that the least significant 
1 in & 1s shifted into a “high-order” bit position. 


In this table, the rows are the partial products, indexed by yo,... ,%n,-1. The 
diagonals are indexed by ¥,_1,...,%o. Since a and ( differ in a bit of S’A X C 
{&o,--. ,Un-1-i}, the difference F = x, — xg must have a | somewhere in the range of 
bit positions [0,n — 1 — 7]. Let j be the position of the least significant 1 in ¥, so that 
either there is a 0 in position 7 — 1, or j = 0. We now choose any variable of 5A Y 
with index k in the range [(n —1) — 7 —m, (n—1)— J]. This range must contain a 
variable y, € SY because if 7 <n —m—1, the range has at least. m+ 1 elements 
but |SAY|<m; if 7 =>n—m, we may choose k = 2 (by definition, y; ¢ S$), which lies 


in the range [0,n —1—J] since 3 <n—2—1. This ensures that 2*% has a 1 in position 
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jg +k and a0 in position 7 +k—1, wheren-—1—m<j7+k<n-—1. It follows that 
modulo 2”, we have 2"”7"-! < 2'% < 2" — 1 —2"-™-?. the upper bound attained if 
all bits except bit 7 +4—1 are l’s andj +k =n—1-—~m. This satisfies the desired 


bounds. 


If a and @ differ in a bit of S’NY C {yo,... , y:-1} the proof is essentially the same. 
We have to choose a, € SAX so that 2"7"™-1 < 2*(y, —yg) < 2"—2"-™-? —1 modulo 
2”. In this case, we know ¥ = yo — yg has a | in the range [0,7 — 1]. Again letting j 
be the least significant 1 of ¥ in this range, we simply choose k anywhere in the range 


In-1l—j,n—-1—j—ml]. Since j <i-1<mandn> Yn >14+ 37+, this range 


always has m-+1 elements. It follows as before that 2°7 satisfies the desired inequality. 


This completes the proof. = 


Lemma 3 Let TC XUY, anda and f be two settings to T. Let d be the greatest index 
in [0,n—2] such that (tayo) # (veys),. [fd > n—m—3 and max(|TN X|,|TOY]) = 
t <3m, then there are two variables, x, € XAT andy, € YT, such that 


(LorYor aga F (erYe") ay 


where of =A+2,4+ yy and B= BP+ayty. 


Proof of Lemma 3: We will consider all pairs of variables (x, y,) such that u+v = d. 


We want (toYar ati # (XerYgr)az1, where 


LalYor! = (La +2") (Ya + 2") 
= (tata +2") + (2"ta + 2"Ya), 

and xgiygs = (ag +2") (ya +2”) 
= (xpye + 2°) + (2x5 + 2"yz). 


Since d is the highest bit in which zay, and xgyg differ, clearly (LaYa + 2") x 
(xaye + 27) a We will choose u and v so that the addition of the “cross terms” 
aq +2" yo tO LoYo + 2% does not affect bits d or d+1 of raya + 2¢ (and similarly for 
B). In order to do this, we choose u and v so that in each case, the cross terms have 
0’s in bit positions d and d+ 1 and furthermore, in the addition of the two integers, 


there is no carry bit into position d. 
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Cu 


Choose u and 
v so these bits 


are all 0’s. 


«x *11110+- = raya 


Figure 3.3: In Lemma 3, we choose x, and y, to set to 1 so that u+v = d and also so 
that the products 2“y, and 2°x, have 0's in bit positions d—1,... ,2 —1 
so that when added to x.y. + 2%, they do not cause a carry to propagate 


into position d+ 1. 


To accomplish this, we first find the largest bit position z less than d where raya 
has a 0 (so positions 7 + 1 through d — 1 are all 1’s). We will choose u and v so that 
2°x4 and 2"y, each has 0’s in positions 2 — 1 through d+ 1. It follows that their sum 
then has 0’s in positions 7 through d+ 1, and so, when added to ray. +2? which has a 
0 in position 7, causes no carry into any position 7+ 1 through d (see Figure 3.3). We 


will choose u and v so that the same conditions hold for 9 as well. 


A simple counting argument now shows that there exist uw and v as desired. First, 
we claim that roy (and rgyg) has 1’s in at most t? bit positions, so that (d—1)—7 < @. 
In general, if the binary representations of integers p and q have w(p) and w(q) 1’s in 
them respectively, then clearly p+q has at most w(p)+ w(q) 1’s in it. Recall a sets at 
most ¢ bits in X or Y. We may therefore view ray. as the addition of at most ¢ shifts 


of x,, and the claim follows. 


We require (2°); = (2°xg), = 0 in at most 77 +4 positions 7: 7 =d+1,d,d—1, 
.,2,0—1. There are at most ¢ bit positions in which either x, or xg has a 1, and 


for each such 1, there are at most t? + 4 “bad” values of v € [0,n — 1] that shift the 1 
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to a position we require to be 0. Thus, x, and xg rule out at most ¢(t? + 4) values 
of v. Furthermore, there are up to ¢ variables of Y that are in 7, making a total of 
t(t?+4)+¢ values of v that we may not choose. Similarly, a total of at most ¢(t?+4)+t 
values of u are ruled out by ya, yg, and T. The number of pairs (a, y,) in which either 


Ly OF Yy has been ruled out is thus at most 


2in  15¢ 
2(1° + 5t) < 2(27m* + 15m) <2 (= + *) 


since t < 3m and m = (/n/4. There are at least d+1 > n—m —2 pairs (xu, yy) such 
that u+v=d. Thus we retain at least 


n-YP2- (Ans Py) = Q(n) 


4 64 4 


good pairs satisfying the desired requirements for x, and y,. For n > 378, this expres- 


sion is greater than 1, implying that there exists a pair as desired. = 


3.3 Improving the bound to 2°%v”) 


We can improve the lower bound to 2°%V” by analyzing more closely how we iterate 
Lemma 3 in the proof of the theorem. We begin with the observation that we needed 
m = O(%/n) because in Lemma 3, we used #? = O(m?) as an upper bound on the 
number of consecutive |’s to the right of position din x,y, or egyg. We then required 
0’s in these O(m?) positions in the cross terms 2’, + 2“yq and 2°xg + 2“yg. Since 
each of the O(m) 1’s in x, may then rule out O(m?) values of v, we needed O(m?) <n 
in order not to rule out all values of v. In order to allow m = O(,./n), we will reduce 
to O(m) the number of positions in which we require 0’s in the cross terms. For the 


rest of this section, we let m= \/n/3. 


Loa =: 10+: 
For example, if we knew that rayq and xgyg looked like? d , then 
teye=-::00-- 


d 
we would need to require 0’s in the cross terms in only three positions: d+ 1, d, 


and d—1. This is sufficient to ensure that the addition of cross term 2’x%, + 2"yo to 


?Here and henceforth, “.--” denotes an arbitrary string of 0’s and 1’s; thus aya = --: 10 --+ has 


a lin bit d, a 0 in bit d— 1, and may have any values in other bit positions. 
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LaYa + 2% does not generate a carry into position d and does not affect bits d or d+ 1 
of tay + 27. The same holds for 3 and we get (Cao )aty A (Leys )az1. With only 
these three positions required to be 0’s, the total number of v’s ruled out by xg and x, 


is proportional to the number of 1’s they contain, which is O(m). Similarly, the cases 


LoYo =err Ll eee CoYo =e: Ll: 
d and d 
rays =~-* 00 raya =~ Ol 


can be handled with only a few constraints by choosing u + v = d— 1 (this will be 
proved in Lemma 5). In fact, there is really only one case in which we need to require 


(2°xg + 2“yg) or (22 + 2"y.) to have many 0’s: 


Definition 4 Let d be the greatest index less than n in which (taYo)y F (eaya)q. We 
say that xoyq and xgyg are k-bad if d >n—m-—A4 and the products look like 


LoYa = 10 eeeee 
Taye = OLLI UVTI. 
n—-m—-6 =k 
or vice versa (exchanging a and 3). 
In this case, say taye = OLILILILIIL---, we must require 2°rg + 2 yg to be 


n—m—6 


0 in the positions of each of these 1’s in order to prevent a carry into position d+ 1 
when we add it to xgyg + 2%. In order to allow m = O(,/n), we will ensure that the 
products are not k-bad for k > m-+4. Then the number of v’s ruled out by each 1 of 
q and xg is 2m + 10, and as long as the number of 1’s in v, or xg is O(m), the total 


number of v’s ruled out is O(m?). 


We will first show that we may begin with products that differ in a high-order bit 
but are not 1-bad, and then prove a version of Lemma 3 in which each application 


allows the “badness” to grow by at most 1. 


Lemma 4 For any two settings a and (3 to S that differ on a bit of S’, there are three 
(or fewer) variables vy, Yv,2 © S such that fora’ = ata,+y,+z and 3 = Bta,ty tz, 
the products taYqr and xgiyg: differ in a high-order bit (in the range [np —m—4,n—1]) 


and moreover, are not 1-bad. 
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(The comment “or fewer” refers to the fact that we may not need to set some or any 


of these three variables.) 


Lemma 5 Let T C X UY, anda and § be two settings to T. Let d be the great- 
est index in [0,n — 2] such that (tayo)y A (€sya)q. Suppose d > n—m—A4 and 
max (|TO X|,|2OY|) =t < 2m+4+5 and also that xayq and xgyg are not k-bad, for 
some k<m-+4. Then there are two variables, x,y, € T, such that 


(TorYo apr F (HYG ays 
for a =at@ty+ YW and B' = B+2y+ yy, and moreover, Loyq and xgrygr are not 


(k + 1)-bad. 


We now have 
Theorem 5 Any read-once branching program for MULT has size 2°V”, 


Proof: The proof is exactly the same as the proof of Theorem 4 except for the lemmas. 
We start with products that differ in a high-order bit but are not 1-bad, as provided by 
Lemma 4. The number of variables in X or Y set in these products is at most m + 2. 


We obtain a difference in bit n — 1 by iterating Lemma 5 at most m+ 3 times, each 


time setting at most one variable in X and in Y. This maintains t < (m+2)+(m+3) 


and k <1+(m +3) as required. | 


We now give the proofs of Lemmas 4 and 5, which we restate for convenience. 


Lemma 4 For any two settings a and § to S that differ on a bit of 5", there are three 
(or fewer) variables vu, Yv,2 © S such that fora’ = ata,+y,+z and 3 = Bta, ty tz, 
the products taYqr and xgiyg: differ in a high-order bit (in the range [np —m—4,n—1]) 
and moreover, are not 1-bad. 


2"-™-3 or not. If they 


Proof: Either x,y. and xgyg differ (modulo 2") by at least 
do, then they must differ in a high-order bit (in the range [pn — m — 4,n — 1]). If not, 


we proceed just as in Lemma 2 to find a variable z such that to4-Yo4, and %+-Ye4- 
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differ by at least 2"~"~°: As in Lemma 2, when a and £ differ in a bit of S’N_X, it is 
sufficient to set to 1 a variable y, € SAY such that 2*(a, — xg) is at least 2"-"~?, or 


_ Qn-m—2 


two segments long, and at most 2” , or “negative two” segments long. Since 


ToYo and xgyg differ by less than one segment (2"~"~*), the translates To4y,Yo+y, and 


2"-™—3 and must fall into different sezments. The rest 


LB+y,¥e+y, lifer by more than 
of the proof follows exactly as before. In order to avoid overly cumbersome notation, 
let us abuse it slightly by calling the products x,y. and xgyg, even though they should 


possibly be called ta4-Yo4, and we4.4g4-- 


Now that we know the products differ in a high-order bit, it remains to ensure that 
they are not 1-bad. Assume they are. Let d be the greatest index less than n of a bit 
position in which z,yq and xgyg differ. 


First, we claim that if the products are l-bad, then in fact d > n—m-—2. Because 


aya = +1: 1O-e: 
if, say d = n —m — 3, then the products look like? tgys = QLLL Le and 
1 
n—m—6 


therefore they differ modulo 2” by at most 2°77? + (2"-™~4 — 1) (since they agree in 
bits d+1 through n—1), but we know they differ by at least 2"~"~*. Furthermore, by 
the same reasoning, not only is d > n—m—2, but x,y. must have a | in some position 


between d— 2 and n — m— 4 inclusive (note that (xaya)a-1 = 0; else the products are 


laa = “++ 100000--. 
not l-bad). For otherwise, the products look like eye <= Q L11111 Lew and 
n—m—6 


thus they differ modulo 2” by at most 2777 + 2"-™-4 — 1, a contradiction. 


So we are reduced to the case that the products are l-bad, differ in position d > 
n—m —2, and ray has a 1 in some position between d — 2 and n —m—4. Let € be 
the highest index of a 1 in this range: raya = °°: 1000 1 --+. We will find a pair of 
variables (ay, yy) with u+v = n—m-—6 so that the cross terms 2° x4, 2°%g, 2" Yo, 2"Yys 
all have 0’s in positions n — m — 8 through n — 1. Then (2"t” + 2°27, + 2“y,) and 
(2"t" + 2%r_g + 2"yz) both look like --- 0000000 1 0---. We see that rgry. looks 


n-1 n—m—6 


like either --- 10 0 01 --+ if there is no carry into position € when 2“*? + 2°r, + 2"ya 


3Without loss of generality, let us assume that in position d, toyq has a1 and rgyg has a 0. 
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is added to raya, or --: 10010--. if there is a carry into position ¢. Meanwhile, 
se OLLITITIL-- 


rgrypr = 2 DODOODO TE = Jooks like ++ 10000000-++ or -+-10000001--: 
tT tT 


n—m—6 n—m—6 
depending on whether there is a carry into position n — m — 6 in this addition. 


Since xgyg has 0’s in positions € < d—2 and £—1 >n—m-—5, we see that 
LoYour = +: 1L00001--- 
d d! 


Leygr = ---1000000--- 
Furthermore, the products agree in all higher bits up to n — 1 because by the definition 


LolYor and xgrygr look like where d’ is either @ or €4+ 1. 


of d, tayo and xgyg agree in bits d+ 1 through n — 1 and we chose x, and y, so that 
the cross terms have 0’s in these positions. Since ( > n —m — 4, it follows that rgiyg: 


and xgyg differ in a high-order bit and are not even 1-bad. 


A counting argument like that for Lemma 3 shows that we may choose x, and y, 
as needed. We require the cross terms to have 0’s in at most m+ 8 positions. Since 


at most m+ 1 bits are set to 1 in x, or xg, the total number of values v that we may 


not choose is (m+ 1)(m+8)+(m-+1). The same number of values u are ruled out, 
making a total of at most 2(m+1)(m+9) = 25+ O(\/n) pairs (ru, yr) that are ruled 


out. Since there are n — m — 5 pairs to choose from initially, we retain Q(n) pairs. 


Lemma 5 Let T C X UY, anda and § be two settings to T. Let d be the great- 
est index in [0,n — 2] such that (tayo)y A (€sya)q. Suppose d > n—m—A4 and 
max (|TO X|,|2OY|) =t < 2m+4+5 and also that xayq and xgyg are not k-bad, for 
some k<m-+4. Then there are two variables, x,y, € T, such that 


(TorYo apr F (HYG ays 
fora =atxtyty and B= B+ ay+ Yo, and moreover, toaYor and xgiyg: are not 


(k + 1)-bad. 


Proof: We have four possible cases (up to switching a and 3): 


LoYa = (1): 10 (2): --- 11+. (3) or (4): ++ 10+: 


pe dle 
LBYg = 1.2 OQ0--- 1.2 00--- nr ee ---0111110--- 
d d d 
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By assumption, d >n—m—4. 


Case l: tay. = cc TOe 
teve =: 00 _— 
It is sufficient to choose (2, y,) so that u+v = d and each of the cross terms 


2°xg, 2"yg, 2°x,, and 2“y, has 0’s in positions d— 3 through d+ 1. Then the sums 
2°xg + 2"yg and 2°x, + 2"y, have 0’s in positions d — 2 through d+ 1. Adding these 
to oo and xgyg respectively therefore causes no carry into position d and thus the 
addition of 2"+” = 2% causes a carry into bit d+1 for a but not for 3. Since rayq and 
xgyg agree in bits d+ 1 through n — 1, this carry bit causes them to differ in bit d+ 1 
and possibly higher bits as well. 


We now verify that raya: and xgryg: are not 1-bad. We know that 2"7"+2"rg+2"yz 
.--0Q--- 
d 


looks like ---0100---. Thus xgrygr = + ---0100-: 


te 1 10---, depending on whether there is a carry into position d—1. Thus agyg does 


_ looks like either --- l O--- or 


not have a string of 1’s extending past position d— 1 > n—m-—5and cannot make the 
products even l-bad. Since the products differ in position d+ 1 or higher and raryqs 


has a 0 in position d, the products cannot be l-bad due to a string of 1’s in rorya. 


To see that we can choose (xy, yy) as desired, we argue as in the proof of Lemma 3. 
The number of positions required to be 0 is 5, ruling out 5¢ values of v. Of the 
d+1=n—O(/n) pairs (au, yy) such that u+v = d, the number of pairs ruled out is 
at most 2(5¢ + t) = 12t < 12(2m +5) = O(\/n), so there are Q(n) remaining pairs to 


choose from. 


Cases 2: taYa = co Tdee 
teye = 000+ 

and 3: taYo = colle 
teye = OL 


It is sufficient to choose (x, y,) as in Case 1 except that u+v=d—1. Adding 241 
will cause a carry to propagate into position d+1 for a but not for 3, causing them to 


differ in bit d+1 and possibly higher bits as well. The counting argument for choosing 
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(Xu, Yv) is exactly the same as in Case | except that there is one fewer pair (au, Yu) 


withutv=d-—-l. 


It only remains to show that in fact raya: and axgiyg are not 1-bad. Now 2"*? + 
2° + 2"yo looks like 700 10--- and so does 2%" 4 2’rg + 2"yg. Thus rorya = 


wee] dee 
d 
+ 0010. and we see that it has a 0 in bit d. 
.--0Q-:- 
d 

Looking now at xgrygy, we see that in Case 2, xgiygi = + OOO. looks like 

either --- 0 1l-++ or--: 10 -++, depending on whether there is a carry into position d— 
1 Ode: 
d 
4+ ..-0010--- 


1. In Case 3, xgryg = looks like either 10s: or cLlOe, 
depending on whether there is a carry into position d— 1. In any case, xgryg: does not 
have a string of l’s extending past d—2 >n—m—6, and so ryyqr and xgryg: are not 


even l-bad. 


Case 4: tayo = cc TOe 
k-1 
—— 
reyg = se QLITTTTIIO:- 
n—m—6 


Without loss of generality, let us say that xgyg contains the maximum number, k—1, of 


consecutive 1’s extending past position n —m—6. We choose (2x, y) so that u+u =d 


and the cross terms 2°x,, 2"Yyo, 2°xg and 2"yg have 0’s in positions (n —m —6)—k —2 
through n — 1. This will ensure that from 2¢ we get a carry into position d+ 1 for a’ 
but not for $’, causing the products to differ in bit d+ 1 and possibly higher bits as 


well. 


The sum 2°rg + 2“yg has 0’s in positions (n — m — 6) —k —1 through n — 1, 
k-1 


—— 
--OLLILILI10--- 
d k-1 


so xgryg = 010000000000 looks like either --- LIJL1TIT110--- or 
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k-1 
te l 1111 Titi 0---+, depending on whether there is a carry into position (n — m— 
6) —k. So axgryg has at most k 1’s extending past position n — m—6. The pair of 
products cannot be worse than k-bad because of a longer string of 1’s in raryq: because 
the products differ in position d+ 1 or higher and zy:y 4: has a 0 in position d. Thus 


LelYor and xg/yg are at worst k-bad. 


The number of positions in which we require 2°x, or 2“xg to be 0 ism+6+k+2 < 
2m +12. Together, x, and xg may rule out t(2m + 12) values v in addition to the t 
variables y, already in 7’. Taking into account the same number of values u ruled out 
by ya and yg, there are at most 2(t(2m + 12) + t)) pairs (au, yv) that could be ruled 
out. Of the d+ 1 = n—O(/n) possible pairs (xy, y.) with u+v = d, a total of at most 


2(2m +5)(2m + 13) = 8S + O(n) 


pairs are ruled out, leaving $ — O(./n) = Q(n) pairs to choose from. For n > 56,000, 


we can say there is at least one pair left. = 


For preciseness, we have given explicit values of n above which our proofs hold; these 
numbers are most likely a reflection of our proofs rather than the true complexity, and 


should not be taken very seriously. 
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3.4 Problem reductions 


We may deduce similar lower bounds for other boolean functions by the standard 
technique of problem reduction. In order to preserve read-once complexity, we will 
consider a very restrictive type of problem reduction. We begin with the notion of 


projection reductions [SV81], as defined in [CSV84]: 


Definition 5 A function f = {fn}nen is projection reducible to a function g = {gn}nen; 
written f <proj g, Uf there is a mapping 


OrAYls +++ sYp(nyt 7 10,1, 21,..., Un, F,--. Tr} 


such that 
Fr(@1,.-+ tn) = Ip(n)(F(Y1), «+» 5 F(Y(ny)) 


for some function p(n) bounded above by a polynomial in n. 


In other words, f<projg if one can use as a black box an algorithm (circuit, branching 
program) for g(41,..- ,Yp(n)) simply by substituting the inputs to f for the inputs to g 
and then taking the output of the algorithm as the output for f. These reductions were 
used by Chandra, Stockmeyer, and Vishkin [CSV84] in their study of constant-depth 
reducibility—clearly, given that f <proj g, if g € AC®° then f € AC®. 


We would like a reduction <’ that allows us to deduce that if f <' g and g € READ-1 
then f € READ-1. It is easy to see that projection reductions satisfy this condition if 


the mapping o is injective with respect to the x variables: 


Definition 6 A function f is read-once reducible to a function g, denoted f <.. 9, if 


there is a projection reduction o from f to g in which fori F 7, 


a(yi) Fo(yj) and oly) F o(y;). 


It follows that a read-once branching program for f(21,...,2,) is obtained by rela- 


belling the nodes of a read-once program for g(y1,... ,Yn)- 
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3.4.1 Reductions to other arithmetic functions 


Projection reductions have also been used to deduce tight lower bounds on the depth of 
polynomial-size threshold circuits. It was originally proved in [HMPST93] that INNER- 
PRODUCT-MODULO-2 cannot be computed in polynomial-size by threshold circuits 
of depth 2. It was also noted there that the projection reduction to multiplication (first 
given in [FS584], from PARITY to MULT) shows that MULT obeys the same lower 
bound. 


Wegener [We93] gives projection reductions from MULT to squaring and inversion 
in order to show that these functions also require depth 3 polynomial-size threshold 
circuits. The lower bound for the middle bit of multiplication implies a lower bound 
for the appropriate bit of these two functions. We phrase the reductions in [We93] in 


terms of the following Boolean functions: 


e SQUARING : {0,1}" — {0,1}; computes “the” middle bit (here, bit n rather 
than bit n — 1 which we chose for MULT) in the square of an n-bit integer: 


SQUARING(z) = (2”)n. 


e INVERSION : {0,1}" — {0,1}; computes the ones’ bit in the reciprocal of an 


n-bit number between 0 and 1: 
INVERSION(x’) = yo 


where x represents the number 0.2,22---t, = Do; x27’ and y = yn+++ Yo is the 
integral part of 1/x. (Note that 1 < y < 2”.) Define the function to be 0 if all 2; 


are 0. 


Wegener actually shows that 
MULT <,10j SQUARING <,,.; INVERSION, 


except that the reductions are given for all bits of multiplication, squaring, and in- 
version. Though it is not noted there, we shall see that each reduction is actually 
a read-once reduction. The polynomial p(n) of the reduction is linear in both cases, 
implying that if each bit of the function is computable with a read-once program of 
size f(n), then MULT is computable with a read-once program of size f(cn) for some 


constant c. This gives the following corollaries to Theorem 5: 
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Corollary 1 Any read-once branching program for computing the function SQUARING 
has size at least 2°V”, 


Proof: We verify that the reduction in [We93] shows MULT <,., SQUARING with a 
polynomial p(n) = 3n + 2. In addition to verifying p(n), we must also check that the 
reduction is indeed between these two Boolean functions and also that the mapping a is 
injective. The reduction simply maps the n-bit inputs x,y (of MULT) to the (3n + 2)- 
bit input z= #27"+) + y (of SQUARING), so that 2? = a? 244) 4 gyQrtDFl 4 y?, 
The middle bit of the product ry is found in the middle bit of z*: (ry)n—1 = (27 )an4o- 


Thus p(n) = 3n 4 2. It is clear that the mapping o is injective since 
Yi if0<i<n; 
o(zi)=4 0 ifn <i < 2(n+4+ 1); 
Vi-aAngr) UAn+1) <i <2%n4+1)4n. 


Corollary 2 Any read-once branching program for computing the function INVERSION 
has size at least 2°V”, 


Proof: We verify that the reduction in [We93] shows SQUARING <,.,. INVERSION 
with polynomial p(n) = 17n + 1. 


The reduction SQUARING<,,,.j INVERSION reduces the problem of computing the 
square of an n-bit integer m to the problem of computing 1/(1—r) = 1+r+a27+274--- 


where 

l-w2 = l—m2-*—2Q7t0r, 
which is a 10n-bit number slightly less than 1. The proof in [We93] shows that the 
product m? lies in bit positions —6n — 1 through —8n in 1/(1—~), its middle bit being 
in position —7n. By instead computing the inverse of 2~(1 — x), a 17n-bit number, 


we find the middle bit of m? in position 0. 


For example, working in decimal, we may compute 5? (so n = 1) by letting 1-2 = 


1—5-107-* —107-"° and calculating 


(1—5-10-*—107!)~" = 1,000500250225-- - 
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from which we may recover 5” = 25 in positions —7 and —8. By instead calculating 


(10-7 -(1—5-10-* — 10-))*, we may find the middle digit, 2, of 25 in position 0. 


To see that the mapping o is injective, simply notice that 1-x = 1—m274" —27-10" 


has 1’s in all positions —1 through —10n, except in positions —3n — 1 through —4n 
where it has exactly the complements of the bits of m. The number 2~7"(1 — 2) is 


similar, with extra 0’s on the left. = 


CHAPTER 4 


Discussion and further work 


In this thesis, we have proved that integer multiplication requires exponential-size 
read-once branching programs. This fact is important for the hardware verification 
community, which would like to find a simple model in which multiplication can be 
computed with polynomial size. It was known already that most oblivious branching 
programs, which are good candidates because of the ease with which they are manip- 


ulated, require exponential size to compute multiplication. 


In the course of understanding the relevant lower bounds and related models, we 
have also assembled a survey of the structure of these low-level complexity classes, 
and also of the main ideas that have been brought to bear in thinking about their 
computation. This survey also includes a few simple proofs that have not yet appeared 


in the literature. 


Further work 


There are many open questions surrounding the topics of this thesis, some of which 
have already been mentioned. We will describe some of these problems that we consider 
to be the most important, interesting, or tractable. The oldest of these problems, 


open since [FHS78], is 
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Open Question 1 /s there a deterministic polynomial-time algorithm for determining 


the equivalence of two read-once programs? 


The answer to this question loses some practical significance in light of the lower bound 
for multiplication and the intractability of the synthesis operations, which make read- 


once programs less attractive as an alternative to OBDD’s in hardware verification. 


Multiplication 


Although perhaps not the most interesting question, there is the possibility of improv- 
ing the lower bound for multiplication. We doubt that 2°(V™ is the true read-once 
complexity of MULT (recall that Bryant’s lower bound for OBDD’s is 2”/*), but the 
simple counting technique used in our proof seems limited to this lower bound. It is 
curious that many of the lower bounds for read-once programs achieve only 2°(V”) if 
n is the number of input bits—only the lower bound of [BHST87] achieves a fully ex- 
ponential lower bound of 2°). This limitation is most likely an artifact of the proofs, 


but it is not well understood. 


In addition to improving the bound, it may also be possible to extend the argument 
to show that a similar bound holds for nondeterministic read-once programs or for read- 


k-times programs. 


Open Question 2 Does MULT require superpolynomial nondeterministic read-once 


programs? ... superpolynomial read-k-times programs? 


For nondeterministic read-once programs, we may define frontier edges as before. Now, 
however, it is not necessary for the inputs reaching an edge to induce the same sub- 
function on the remaining input variables, since inputs may follow several different 
paths. We can say, however, that the inputs in MULT~'(1) that pass through a fron- 
tier edge are described by a function fi(.4X1,¥1) A fo(X2, Yo) where X1 U Y; is in the 
boundary of the filter F and X2U Y2 = (X UY) \ (X%1U%). Thus MULT can be 
written as the conjunction, over all frontier edges, of such functions. We would like to 
show that since each of these functions must reject all of MULT~'(0), it can accept 


only an exponentially small fraction of MULT™'(1). 
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That is, we would like to show that given MULT(uv) = | for all wu € f7'(1) and 
v € fy(1), it must be that |f7t(1) x fyl(1)| < 2°” - |Muxtr-!(1)| for some k > 0. 
(Here, wu is a setting to X, UY; and v is a setting to X_UY3.) For comparison, the proof 
of our lower bound (Theorem 5) in effect shows that given MULT(uv) = MULT(u’v) 
for all u,u’ € f7'(1) and for all inputs v, it must be that |f7'(1)|-2!! is a fraction 
2-%V") of the total number of inputs, 22”. 


Finally, we mention that there seem to be no nontrivial upper bounds for MULT in 
either nondeterministic or randomized read-k-times models, for k = o(n). Of course, 
in all other models considered in this thesis—OBDD’s, k-OBDD’s, k-IBDD’s, indeed 
any linear-length oblivious programs, even nondeterministic, as well as non-oblivious 


read-once programs—it is known that exponential size is required. 


The read-k-times hierarchy 


As mentioned in Section 2.4.1, it is not known whether the read-&-times hierarchy is 


strict: 


Open Question 3 For some k > 2, is there a function computable by polynomial- 
size read-k-times programs but not computable by polynomial-size read-(k — 1)-times 


programs? 


In [S593], it is conjectured that such a function is the problem of determining 
whether a k-dimensional hypergraph on n nodes is r-regular for, say, r = n/2. (Re- 
call that [S$$93] proves that this problem on ordinary graphs (k = 2), while easily 
computed by read-2-times programs, requires read-once programs of size 2%").) The 
function 7-MATRIX may be regarded as a special case of this problem: it is the case 
of determining whether a bipartite n x n graph is l-regular. We believe that higher 


dimensional versions of this latter problem should separate the read-k-times hierarchy. 


For example, consider the 3-dimensional version, “7-CUBE”, defined on annxnxn 
cube of boolean variables, which has the value 1 exactly when each of the n planes 
in each of the 3 dimensions contains exactly one 1. 7-CUBE is easily computed with 


read-3-times programs. Here is a possible strategy for showing it is not computable 
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with polynomial-size read-2-times programs. According to Theorem 1 in [BRS93], a 
read-2-times program for 7-CUBE enables us to express the function as 


poly(BPsize) 
m-CUBE = = \J__fia( Xt) A fix( Xi2) A fis Xia) A fia Xia) 
2=1 
where each X;; is a subset of half the n° variables and each variable appears in at most 
two of Xj, Xj2, X33, Xia for each 2. We would like to show that a function of the form 
fia(Xa) A fia Xi2) A fis(Xig) A fia( Xia), which rejects all of t-CUBE7'(0), can accept 
only an exponentially small fraction of r-CUBE™'(1). 


Since each variable is in two of the X;, one of the three partitions X;4UXj2 | Xi3 Xia, 
Xi, U Xig | Xig U Xig and Xj U Xi4 | Xig U Xi3 contains that variable on only one side 
of the partition (“fails to split” that variable). It follows that one of these partitions 
fails to split at least 1/3 of the variables. From this, we may argue further that for one 
of these partitions, there are at least 1/6 of the variables, S$, that appear only on one 
side of the partition and at least 1/6 of the variables, 7’, that appear only on the other 
side. Thus, we may write (if the best partition is XU X2 | X3 U X4) 


fia( Xin) A fia Xia) A fis(Xis) A fia( Xia) = fi Xin U Xin) A fi" (Xis U Xia) 
= f(X\S)Af(X\ 7). 


Since S' and J’ each has more than 1/8 of all the variables, there must be many coplanar 
pairs (s,t) € S x T. This function cannot accept two inputs x and y that have s = 1 
and t = 1 respectively if 2 and y agree on the variables SUT, since then it would 
also accept the input (which should be rejected) that looks like on S and like y 
on T’. Furthermore, the fraction of inputs in r-CUBE'(1) that have all 0’s in a given 
* x 2 x 2 subcube is exponentially small in n, for ¢ constant. It should be possible to 


combine these facts to obtain the desired lower bound. 


Read-once reductions 


Read-once reductions appear to be rather limited in their utility. It is not clear, for 
example, how to use them even to show that directed s,t-connectivity does not have 


polynomial-size read-once programs. (This function, being NL-complete, is not known 
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to have polynomial-size branching programs at all, regardless of restrictions on reading 
variables.) We may construct a branching program of size O(n?) for MULT in which 
there is a s,t-path if and only if MULT is 1, but since many edges are labelled with the 
same MULT variable, a computation that reads each edge variable once in fact reads 
the variables of MULT many times. In other words, this is a projection reduction in 


which the variable mapping does not have the necessary injectivity property. 


The Fourier spectrum 


It is an interesting question whether there is any correlation between the Fourier spec- 


trum of a function and the size of its OBDD’s. 


Open Question 4 /s there a nice correlation between some property of a functions 


Fourier spectrum and the size of its OBDD’s? 


In particular, it would be useful to know which coefficients are the largest, as this is 
the information that is used in the remarkable algorithms for learning functions with 
shallow decision trees or small constant-depth circuits. As explained in Section 2.7.2, 
the correlations found between such functions and the properties of their spectrums do 


not hold for OBDD’s. 


The ordering problem for OBDD’s 


One of the most useful research directions, as far as the hardware verification com- 
munity is concerned, is further analysis of the variable ordering problem described in 
Section 2.7.1. Now that it is known to be NP-complete, approximation algorithms—or 


results demonstrating the hardness of approximability—are of most interest. 


Open Question 5 /s there a reasonable algorithm (in P, RP, or BPP) which, given 
an OBDD, finds another OBDD (possibly obeying a different ordering of the variables) 


with size that is within a bounded factor of optimal? 


Randomized algorithms for this problem should also be considered. 
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